The math behind neural networks visually explained
Artificial neural networks are the most powerful and at the same time the most complicated machine learning models. They are particularly useful for complex tasks where traditional machine learning algorithms fail. The main advantage of neural networks is their ability to learn intricate patterns and relationships in data, even when the data is highly dimensional or unstructured.
Many articles discuss the math behind neural networks. Topics like different activation functions, forward and backpropagation algorithms, gradient descent, and optimization methods are discussed in detail. In this article, we take a different approach and present a visual understanding of a neural network layer by layer. We will first focus on the visual explanation of single-layer neural networks in both classification and regression problems and their similarities to other machine learning models. Then we will discuss the importance of hidden layers and non-linear activation functions. All the visualizations are created using Python.
All the images in this article were created by the author.
Neural networks for classification
We start with classification problems. The simplest type of classification problem is a binary classification in which the target has only two categories or labels. If the target has more than two labels, then we have a multi-class classification problem.
Single-layer networks: perceptron
A single-layer neural network is the simplest form of an artificial neural network. Here we only have an input layer which receives the input data and an output layer that produces the output of the network. The input layer isn’t considered a true layer in this network since it merely passes the input data. That’s why this architecture is called a single-layer network. Perceptron, the first neural network ever created, is the simplest example of a single-layer neural network.
The perceptron was created in 1957 by Frank Rosenblatt. He believed that perceptron can simulate brain principles, with the ability to learn and make decisions. The original perceptron was designed to solve a binary classification problem.
Figure 1 shows the architecture of a perceptron. The input data has n features denoted by x₁ to x_n. The target y has only two labels (y=0 and y=1).

The input layer receives the features and passes them to the output layer. The neuron in the output layer calculates the weighted sum of the input features. Each input feature, xᵢ, is associated with the weight wᵢ. The neuron multiplies each input by its corresponding weight and sums up the results. A bias term, w₀, is also added to this sum. If we denote the sum by z, we have:

The activation function is a step function defined as:

This activation function is plotted in Figure 2.

The output of the perceptron denoted by y^ is calculated as follows:

To visualize how a perceptron works, we use a simple training dataset with only two features x₁ and x₂. This dataset is created in Listing 1. It is defined randomly, and the target y has only two labels (y=0 and y=1). We also import all the Python libraries needed in this article at the beginning of this listing. The dataset is plotted in Figure 3.
# Listing 1
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
import random
import tensorflow as tf
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.utils import to_categorical
from tensorflow.keras import backend
np.random.seed(3)
n = 30
X1 = np.random.randn(n,2)
y1 = np.random.choice((0, 1),size=n)
X1[y1>0,0] -= 4
X1[y1>0,1] += 4
scaler = StandardScaler()
X1 = scaler.fit_transform(X1)
plt.figure(figsize=(5, 5))
marker_colors = ['red', 'blue']
target_labels = np.unique(y1)
n = len(target_labels)
for i, label in enumerate(target_labels):
plt.scatter(X1[y1==label, 0], X1[y1==label,1], label="y="+str(label),
edgecolor="white", color=marker_colors[i])
plt.xlabel('$x_1$', fontsize=16)
plt.ylabel('$x_2$', fontsize=16)
plt.legend(loc='best', fontsize=11)
ax = plt.gca()
ax.set_aspect('equal')
plt.xlim([-2.3, 1.8])
plt.ylim([-1.9, 2.2])
plt.show()

This article does not go into detail about the neural network training process. Instead, we focus on the behaviour of an already trained neural network. In Listing 2, we define and train a perceptron using the previous dataset.
# Listing 2
class Perceptron(object):
def __init__(self, eta=0.01, epochs=50):
self.eta = eta
self.epochs = epochs
def fit(self, X, y):
self.w = np.zeros(1 + X.shape[1])
for epoch in range(self.epochs):
for xi, target in zip(X, y):
error = target - self.predict(xi)
self.w[1:] += self.eta * error * xi
self.w[0] += self.eta * error
return self
def net_input(self, X):
return np.dot(X, self.w[1:]) + self.w[0]
def predict(self, X):
return np.where(self.net_input(X) >= 0.0, 1, 0)
perc = Perceptron(epochs=150, eta=0.05)
perc.fit(X1, y1)
Now we want to see how this model classifies our training dataset. Hence, we define a function that plots the decision boundary of the trained neural network. This function defined in Listing 3, creates a mesh grid on the 2D space and then uses a trained model to predict the target of all the points on that grid. The points with different labels are colored differently. Therefore, the decision boundary of the model can be visualized using this function.
# Listing 3
def plot_boundary(X, y, clf, lims, alpha=1):
gx1, gx2 = np.meshgrid(np.arange(lims[0], lims[1],
(lims[1]-lims[0])/500.0),
np.arange(lims[2], lims[3],
(lims[3]-lims[2])/500.0))
backgd_colors = ['lightsalmon', 'aqua', 'lightgreen', 'yellow']
marker_colors = ['red', 'blue', 'green', 'orange']
gx1l = gx1.flatten()
gx2l = gx2.flatten()
gx = np.vstack((gx1l,gx2l)).T
gyhat = clf.predict(gx)
if len(gyhat.shape)==1:
gyhat = gyhat.reshape(len(gyhat), 1)
if gyhat.shape[1] > 1:
gyhat = gyhat.argmax(axis=1)
gyhat = gyhat.reshape(gx1.shape)
target_labels = np.unique(y)
n = len(target_labels)
plt.pcolormesh(gx1, gx2, gyhat, cmap=ListedColormap(backgd_colors[:n]))
for i, label in enumerate(target_labels):
plt.scatter(X[y==label, 0], X[y==label,1],
label="y="+str(label),
alpha=alpha, edgecolor="white",
color=marker_colors[i])
Now, we use this function to plot the decision boundary of the perceptron for the training dataset. The result is shown in Figure 4.
+ There are no comments
Add yours