status: deep-read
- LLaMA: Open and Efficient Foundation Language Models — 2023, deep-read
- Training language models to follow instructions with human feedback — 2022, deep-read
- LoRA: Low-Rank Adaptation of Large Language Models — 2022, deep-read
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models — 2022, deep-read
- Swin Transformer: Hierarchical Vision Transformer using Shifted Windows — 2021, deep-read
- How to Train State-Of-The-Art Models Using TorchVision’s Latest Primitives — 2021, deep-read
- Language Models are Few-Shot Learners — 2020, deep-read
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer — 2020, deep-read
- Language Models are Unsupervised Multitask Learners — 2019, deep-read
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding — 2019, deep-read
- Improving Language Understanding by Generative Pre-Training — 2018, deep-read
- Group Normalization — 2018, deep-read
- Deep contextualized word representations — 2018, deep-read
- Density Estimation Using Real-NVP — 2017, deep-read
- Attention Is All You Need — 2017, deep-read
- Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks — 2016, deep-read
- Layer Normalization — 2016, deep-read
- Unsupervised Domain Adaptation by Backpropagation — 2015, deep-read
- Neural Machine Translation by Jointly Learning to Align and Translate — 2015, deep-read
- Deep Residual Learning for Image Recognition — 2015, deep-read
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift — 2015, deep-read
- Adam: A Method for Stochastic Optimization — 2015, deep-read
- Sequence to Sequence Learning with Neural Networks — 2014, deep-read
- Dropout: A Simple Way to Prevent Neural Networks from Overfitting — 2014, deep-read
- Efficient Estimation of Word Representations in Vector Space — 2013, deep-read