Learning Rate Schedules
These strategies adjust the learning rate, and/or other optimizer parameters, throughout the training process. Generally, large learning rates are used in the early stages of training where large steps can be taken to improve efficiency before slowly decaying to small rates to encourage fine-grained convergence.