I’m an AI research engineer in NYC. Previously I was part of the founding team at Character AI, and before that led large language model efforts at Meta AI (FAIR), including Fully Sharded Data Parallel (FSDP), fairseq, OPT 175B and RoBERTa.
Contact:
e-mail
GitHub
Google Scholar
LinkedIn
Twitter
2022 | |
---|---|
OPT: Open Pre-trained Transformer Language Models
|
|
Efficient Large Scale Language Modeling with Mixtures of Experts
|
|
Few-shot Learning with Multilingual Language Models
|
|
2021 | |
NormFormer: Improved Transformer Pretraining with Extra Normalization
|
|
Fully Sharded Data Parallel: faster AI training with fewer GPUs
|
|
Larger-Scale Transformers for Multilingual Masked Language Modeling
|
|
Recipes for Building an Open-Domain Chatbot
|
|
Analyzing the Forgetting Problem in Pretrain-Finetuning of Open-domain Dialogue Response Models
|
|
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences
|
|
Residual Energy-Based Models for Text
|
|
2020 | |
Few-shot Sequence Learning with Transformers
|
|
Pretrained Language Models for Biomedical and Clinical Tasks: Understanding and Extending the State-of-the-Art
|
|
General Purpose Text Embeddings from Pre-trained Language Models for Scalable Inference
|
|
Unsupervised Cross-lingual Representation Learning at Scale
|
|
On The Evaluation of Machine Translation Systems Trained With Back-Translation
|
|
Residual Energy-Based Models for Text Generation
|
|
2019 | |
RoBERTa: A Robustly Optimized BERT Pretraining Approach
|
|
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
|
|
The FLoRes Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English
|
|
Mixture Models for Diverse Machine Translation: Tricks of the Trade
|
|
Facebook AI's WAT19 Myanmar-English Translation Task Submission
|
|
Facebook FAIR's WMT19 News Translation Task Submission
|
|
2018 | |
Phrase-Based & Neural Unsupervised Machine Translation
|
|
Understanding Back-Translation at Scale
|
|
Scaling Neural Machine Translation
|
|
Analyzing Uncertainty in Neural Machine Translation
|
|
Classical Structured Prediction Losses for Sequence to Sequence Learning
|
Towards a General Rule for Identifying Deceptive Opinion Spam
|
|
Impact of Mobility and Timing on User-Generated Content
|
|
Identifying Manipulated Offerings on Review Portals
|
|
|
|
Negative Deceptive Opinion Spam
|
|
Estimating the Prevalence of Deception in Online Review Communities
|
|
IBM at TREC 2012: Microblog Track
|
|
In Search of a Gold Standard in Studies of Deception
|
|
Multi-aspect Sentiment Analysis with Topic Models
|
|
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
|
Theme by orderedlist