I’m an AI research engineer in NYC and part of the founding team at Character AI. Previously I led large language model efforts at Meta AI (FAIR), including Fully Sharded Data Parallel (FSDP), fairseq, OPT 175B and RoBERTa.
Contact:
e-mail
GitHub
Google Scholar
LinkedIn
Twitter
2022 | |
---|---|
OPT: Open Pre-trained Transformer Language Models
|
|
Efficient Large Scale Language Modeling with Mixtures of Experts
|
|
Few-shot Learning with Multilingual Language Models
|
|
2021 | |
NormFormer: Improved Transformer Pretraining with Extra Normalization
|
|
Fully Sharded Data Parallel: faster AI training with fewer GPUs
|
|
Larger-Scale Transformers for Multilingual Masked Language Modeling
|
|
Recipes for Building an Open-Domain Chatbot
|
|
Analyzing the Forgetting Problem in Pretrain-Finetuning of Open-domain Dialogue Response Models
|
|
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences
|
|
Residual Energy-Based Models for Text
|
|
2020 | |
Few-shot Sequence Learning with Transformers
|
|
Pretrained Language Models for Biomedical and Clinical Tasks: Understanding and Extending the State-of-the-Art
|
|
General Purpose Text Embeddings from Pre-trained Language Models for Scalable Inference
|
|
Unsupervised Cross-lingual Representation Learning at Scale
|
|
On The Evaluation of Machine Translation Systems Trained With Back-Translation
|
|
Residual Energy-Based Models for Text Generation
|
|
2019 | |
RoBERTa: A Robustly Optimized BERT Pretraining Approach
|
|
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
|
|
The FLoRes Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English
|
|
Mixture Models for Diverse Machine Translation: Tricks of the Trade
|
|
Facebook AI's WAT19 Myanmar-English Translation Task Submission
|
|
Facebook FAIR's WMT19 News Translation Task Submission
|
|
2018 | |
Phrase-Based & Neural Unsupervised Machine Translation
|
|
Understanding Back-Translation at Scale
|
|
Scaling Neural Machine Translation
|
|
Analyzing Uncertainty in Neural Machine Translation
|
|
Classical Structured Prediction Losses for Sequence to Sequence Learning
|
Towards a General Rule for Identifying Deceptive Opinion Spam
|
|
Impact of Mobility and Timing on User-Generated Content
|
|
Identifying Manipulated Offerings on Review Portals
|
|
|
|
Negative Deceptive Opinion Spam
|
|
Estimating the Prevalence of Deception in Online Review Communities
|
|
IBM at TREC 2012: Microblog Track
|
|
In Search of a Gold Standard in Studies of Deception
|
|
Multi-aspect Sentiment Analysis with Topic Models
|
|
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
|
Theme by orderedlist