tasks: convergence¶
- Cramming: Training a Language Model on a Single GPU in One Day — 2022, to-read
- Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks — 2016, deep-read
- Layer Normalization — 2016, deep-read
- Very Deep Convolutional Networks for Large-Scale Image Recognition — 2015, skimmed
- Deep Residual Learning for Image Recognition — 2015, deep-read
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift — 2015, deep-read
- Adam: A Method for Stochastic Optimization — 2015, deep-read