Learning Rate Schedules

These strategies adjust the learning rate, and/or other optimizer parameters, throughout the training process. Generally, large learning rates are used in the early stages of training where large steps can be taken to improve efficiency before slowly decaying to small rates to encourage fine-grained convergence.

Hero images generated with neural networks via midjourney.