Overview

Neural networks are composed of many connected neurons organized into hierarchical layers. The connections between neurons have a weight $w_i$ associated with them that changes through learning. The output of a neuron is computed by taking the weighted sum of all the inputs $x_i$ and a bias term $b$ feeding into the neuron and passing it through a non-linear activation function $\sigma$.

$$ y = \sigma (\sum_{i} {w_i x_i} + b) $$

The activation function plays a very important role in neural networks. Injecting non-linearity through these activation functions enables the ability for a network to model complex relationships.

Many activation functions have been used over the years with the most popular in early neural network research being sigmoids like the logistic function or the hyperbolic tangent.

And we can plot the derivatives using torch's autograd.

Sigmoidal functions are great for shallow networks, however they tend to suffer from the vanishing gradient problem with deeper networks.

Modern deep learning models mostly use rectifiers which have an unbounded and (mostly) constant positive range.