Overview

Pruning is an effective way to reduce the size of neural networks by removing parameters from large weight matrices.

We can visualize each of the filter weights for a whole layer with matshow.

We can then prune the weights with any number of different pytorch strategies. The granularity of the pruning method can be unstructured or structured, where unstructured methods refer to weight-level or connection-level masking and structured methods refer to neuron-level, channel-level, or layer-level masking. Removing entire rows, columns, or blocks from a layer's weight matrix can be effective at reducing computational complexity as the reduced size of the weight matrix can be leveraged for hardware optimizations. This is more difficult to achieve with the sparse matrices resulting from unstructured masking. However, unstructured masking tends to result in networks with better generalization as the number of masked parameters increases.

\begin{align} \hat{W}_{uns} &= \begin{bmatrix} w_{11} & w_{12} & 0 & w_{14} \\ 0 & w_{22} & 0 & 0 \\ w_{31} & 0 & 0 & w_{34} \\ 0 & 0 & w_{43} & w_{44} \end{bmatrix} \\[0.1in] \hat{W}_{str} &= \begin{bmatrix} w_{11} & w_{12} & w_{13} & w_{14} \\ 0 & 0 & 0 & 0 \\ w_{31} & w_{32} & w_{33} & w_{34} \\ 0 & 0 & 0 & 0 \end{bmatrix} \end{align}

Pruning in PyTorch works by adding some additional computation into the given module. The original weight matrix is copied over to a new parameter named weight_orig. The pruned subnetwork is represented with a binary weight_mask where a value of 1 means that the given parameter is retained. This element wise product is applied automatically when calling module.weight thanks to an additional forward_hook.

Now let's reset the pruned module and demonstrate how structured methods operate over different dimensions.