The tutorials are generated from Python 2 IPython Notebook files, which will be linked to at the end of each chapter so that you can adapt and run the examples yourself. The neural networks themselves are implemented using the Python NumPy library which offers efficient implementations of linear algebra functions such as vector and matrix multiplications. Illustrative plots are generated using Matplotlib. If you want to run these examples yourself and don’t have Python with the necessary libraries installed I recommend to download and install Anaconda Python, which is a free Python distribution that contains all the libraries you need to run these tutorials, and is used to create these tutorials.
The code input cells in this blog can be collapsed or expanded by clicking on the button in the top right of each cell.
A version of this tutorial is also available in Chinese thanks to Mingming Chen.
Linear regression
This first part will cover:
A very simple neural network
Concepts such as target function and cost function
Gradient descent optimisation
All this will be illustrated with the help of the simplest neural network possible: a 1 input 1 output linear regression model that has the goal to predict the target value
Image of the simple neural network
In regular neural networks, we typically have multiple layers, non-linear activation functions, and a bias for each node. In this tutorial, we only have one layer with one weight parameter
In this tutorial, we will approximate the targets
The notebook starts out with importing the libraries we need:
In [1]:
Python imports
import numpy # Matrix and vector computation package
import matplotlib.pyplot as plt # Plotting library
Allow matplotlib to plot inside this notebook
%matplotlib inline
Set the seed of the numpy random number generator so that the tutorial is reproducable
numpy.random.seed(seed=1)
Define the target function
In this example, the targets
We will sample 20 input samples
In [2]:
Define the vector of input samples as x, with 20 values sampled from a uniform distribution
between 0 and 1
x = numpy.random.uniform(0, 1, 20)
Generate the target values t from x with small gaussian noise so the estimation won’t
be perfect.
Define a function f that represents the line that generates t without noise
def f(x): return x * 2
Create the targets t with some gaussian noise
noise_variance = 0.2 # Variance of the gaussian noise
Gaussian noise error for each sample in x
noise = numpy.random.randn(x.shape[0]) * noise_variance
Create targets t
t = f(x) + noise
In [3]:
Plot the target t versus the input x
plt.plot(x, t, ‘o’, label=’t’)
Plot the initial line
plt.plot([0, 1], [f(0), f(1)], ‘b-‘, label=’f(x)’)
plt.xlabel(‘
plt.ylabel(‘
plt.ylim([0,2])
plt.title(‘inputs (x) vs targets (t)’)
plt.grid()
plt.legend(loc=2)
plt.show()
Define the cost function
We will optimize the model
Notice that we take the sum of errors over all samples, which is known as batch training. We could also update the parameters based upon one sample at a time, which is known as online training.
This cost function for variable
The neural network model is implemented in the nn(x, w) function, and the cost function is implemented in the cost(y, t) function.
In [4]:
Define the neural network function y = x * w
def nn(x, w): return x * w
Define the cost function
def cost(y, t): return ((t - y)**2).sum()
In [5]:
Plot the cost vs the given weight w
Define a vector of weights for which we want to plot the cost
ws = numpy.linspace(0, 4, num=100) # weight values
cost_ws = numpy.vectorize(lambda w: cost(nn(x, w) , t))(ws) # cost for each weight in ws
Plot
plt.plot(ws, cost_ws, ‘r-‘)
plt.xlabel(‘
plt.ylabel(‘
plt.title(‘cost vs. weight’)
plt.grid()
plt.show()
Optimizing the cost function
For a simple cost function like in this example, you can see by eye what the optimal weight should be. But the error surface can be quite complex or have a high dimensionality (each parameter adds a new dimension). This is why we use optimization techniques to find the minimum of the error function.
Gradient descent
One optimization algorithm commonly used to train neural networks is the gradient descent algorithm. The gradient descent algorithm works by taking the derivative of the cost function
With
With
Where
And since
So the full update function
In the batch processing, we just add up all the gradients for each sample:
To start out the gradient descent algorithm, you typically start with picking the initial parameters at random and start updating these parameters with
The gradient
In [6]:
define the gradient function. Remember that y = nn(x, w) = x * w
def gradient(w, x, t):
return 2 * x * (nn(x, w) - t)
define the update function delta w
def delta_w(w_k, x, t, learning_rate):
return learning_rate * gradient(w_k, x, t).sum()
Set the initial weight parameter
w = 0.1
Set the learning rate
learning_rate = 0.1
Start performing the gradient descent updates, and print the weights and cost:
nb_of_iterations = 4 # number of gradient descent updates
w_cost = [(w, cost(nn(x, w), t))] # List to store the weight,costs values
for i in range(nb_of_iterations):
dw = delta_w(w, x, t, learning_rate) # Get the delta w update
w = w - dw # Update the current weight parameter
w_cost.append((w, cost(nn(x, w), t))) # Add weight,cost to list
Print the final w, and cost
for i in range(0, len(w_cost)):
print(‘w({}): {:.4f} \t cost: {:.4f}’.format(i, w_cost[i][0], w_cost[i][1]))
w(0): 0.1000 cost: 13.6197
w(1): 1.5277 cost: 1.1239
w(2): 1.8505 cost: 0.4853
w(3): 1.9234 cost: 0.4527
w(4): 1.9399 cost: 0.4510
Notice in the previous outcome that the gradient descent algorithm quickly converges towards the target value around
In [7]:
Plot the first 2 gradient descent updates
plt.plot(ws, cost_ws, ‘r-‘) # Plot the error curve
Plot the updates
for i in range(0, len(w_cost)-2):
w1, c1 = w_cost[i]
w2, c2 = w_cost[i+1]
plt.plot(w1, c1, ‘bo’)
plt.plot([w1, w2],[c1, c2], ‘b-‘)
plt.text(w1, c1+0.5, ‘
Show figure
plt.xlabel(‘
plt.ylabel(‘
plt.title(‘Gradient descent updates plotted on cost function’)
plt.grid()
plt.show()
Gradient descent updates
The last figure shows the gradient descent updates of the weight parameters for 2 iterations. The blue dots represent the weight parameter values
The regression line fitted by gradient descent with 10 iterations is shown in the figure below. The fitted line (red) lies close to the original line (blue), which is what we tried to approximate via the noisy samples. Notice that both lines go through point
In [8]:
w = 0
Start performing the gradient descent updates
nb_of_iterations = 10 # number of gradient descent updates
for i in range(nb_of_iterations):
dw = delta_w(w, x, t, learning_rate) # get the delta w update
w = w - dw # update the current weight parameter
In [9]:
Plot the fitted line agains the target line
Plot the target t versus the input x
plt.plot(x, t, ‘o’, label=’t’)
Plot the initial line
plt.plot([0, 1], [f(0), f(1)], ‘b-‘, label=’f(x)’)
plot the fitted line
plt.plot([0, 1], [0*w, 1*w], ‘r-‘, label=’fitted line’)
plt.xlabel(‘input x’)
plt.ylabel(‘target t’)
plt.ylim([0,2])
plt.title(‘input vs. target’)
plt.grid()
plt.legend(loc=2)
plt.show()
This post at peterroelants.github.io is generated from an IPython notebook file. Link to the full IPython notebook file