Overview

Learning rate scheduling is a crucial component to training good models.

Let's start with a simple CNN.

Now we'll plot some example learning rate schedules. Pytorch provides several nice functions that plug in directly to an optimizer.

The cosine annealing scheduler tends to be very popular in many contexts for it's smooth decaying property.

Several low-cost ensemble learning algorithms make use of cyclic schedules. Snapshot ensembles are the most popular of these.

The one cycle schedule is a very effective strategy which includes a warmup phase followed by a decay phase. This has been shown to lead to super-convergence in some cases and is highly effective for tuning subnetworks.