Building a Layer Two Neural Network From Scratch Using Python
An in-depth tutorial on setting up an AI network
Hello AI fans! I am so excited to share with you how to build a neural network with a hidden layer! Follow along and let’s get started!
Importing Libraries
The only library we need for this tutorial is NumPy.
import numpy as np
Activation Function
In the hidden layer, we will use the tanh activation function and in the output layer, I will use the sigmoid function. It is easy to find information on both the sigmoid function and the tanh function graph. I don’t want to bore you with explanations, so I will just implement it.

def sigmoid(x):
return (1 / (1 + np.exp(-x)))
Setting Parameters
What are parameters and hyperparameters? Parameters are weights and biases. Hyperparameters effect parameters and are before the learning begins. Setting hyperparameters perfectly correctly at first is not a piece of cake, you’ll need to tinker and tweak your values. The learning rate, number of iterations, and regularization rate, among others, can all be considered as hyperparameters.
Wondering how to set the matrices sizes? The answer just below!

What does all that mean? For example:
(layer 0 so L = 0) number of neurons in input layers = 3
(layer 1 so L = 1) number of neurons in hidden layers = 5
(layer 2 so L = 2) number of neurons in output layers = 1

I hope this all makes sense! Let’s set the parameters:
We define W1, b1, W2, and b2. It doesn’t hurt if you set your biases to zero at first. However, be very careful when initializing weights. Never set the weights to zero at first. Why exactly? Well, if you do, then in Z = Wx + b, Z will always be zero. If you are building a multi-layer neural network, neurons in every layer will behave like there is one neuron. So how do we initialize weights at first? I use he initialization.

# Python implementation
np.random.randn(output_size, hidden_size)*np.sqrt(2/hidden_size)
You don’t have to use he initialization, you can also use this:
np.random.randn(output_size, hidden_size)*0.01
I’d recommend never setting weights to zero or a big number when initializing parameters.
Forward Propagation

The diagram above should give you a good idea of what forward propagation is. The implementation in Python is:
Why we are storing {‘Z1’: Z1, ‘Z2’: Z2, ‘A1’: A1, ‘y’: y}? Because we will use them when back-propagating.
Cost function
We just looked at forward propagation and obtained a prediction (y). We calculate it using a cost function. The below graph explains:


We update our parameters and find the best parameter that gives us the minimum possible cost. I’m not going to delve into derivatives, but note that on the graph above, if you are on the right sight of the parabola, the derivative (slope) will be positive, so the parameter will decrease and move left approaching the parameter that returns the minimum cost. On the left side, the slope will be negative, so the parameter increases towards the value we want. Let’s look at the cost function we will use:

Python code for cost function:
Backpropagation
We’ve found the cost, now let’s go back and find the derivative of our weights and biases. In a future piece, I plan to show you how to derivate them step by step.
What are the params
and cache
in def backPropagation(X, Y, params, cache)
? When we use forward propagation, we store values to use during backpropagation. Params are parameters (weight and biases).
Updating Parameters
Now that we have our derivatives, we can use the equation below:

In that equation, alpha (α) is the learning rate hyperparameter. We need to set it to some value before the learning begins. The term to the right of the learning rate is the derivative. We know alpha and derivatives, let’s update our parameters.
All About Loops
We need to run many interations to find the parameters that return the minimum cost. Let’s loops it!
Hidden_size
means the number of neurons in the hidden layer. It looks like a hyperparameter. Because you set it before learning begins! What return params, cost_
tells us. params
are the best parameters we found and cost_
is just cost we estimated in every episode.
Let’s Try Our Code!
Use sklearn to create a dataset.
import sklearn.datasets
X, Y = sklearn.datasets.make_moons(n_samples=500, noise=.2)
X, Y = X.T, Y.reshape(1, Y.shape[0])
X
input, Y
actual output.
params, cost_ = fit(X, Y, 0.3, 5, 5000)
I set the learning rate to 0.3, the number of neurons in the hidden layer to 5 and the number of iterations to 5000.
Feel free to try with different values.
Let’s draw a graph showing how the cost function changed with every episode:
import matplotlib.pyplot as plt
plt.plot(cost_)

Bingo! We did it!
first_cost = 0.7383781203733911
last_cost = 0.06791109327547613
Full code:
Thank you for reading! I hope this tutorial was helpful!