The basics of Neural Networks — Part I
Have you ever heard about Neural Networks and thought that this particular subject is way too complicated and that there’s no way you can understand what they represent? If this applies to you, you’ve come to the right place.
In this first series of articles, I will explain the main idea of how Neural Networks works, without giving much mathematical detail. Besides, to give a better understanding, I will also show a study case from an interesting project that I’ve been working on.
What is a Neural Network?
Neural Networks (NN) are computing systems that were created vaguely inspired by biological neural networks that constitute animals’ brains. A NN is based on a collection of connected nodes (also called neurons), where each connection, like the synapses in a biological brain, can transmit a signal to other neurons, which, in this case, is represented by a real number.
In order to build a good Neural Network, we must provide a huge amount of data. The idea is to use this data to “teach” the NN patterns that can be used to find out and predict a specific task.
Imagine a two years old child that still doesn’t know what is a cat. To teach this child what a cat looks like, we could show many different photos and tell which one represents a cat and which one doesn’t. In this way, depending on how many images we show, this child will eventually be able to recognize a cat during a daily walk in the neighborhood.
The same idea is applied to Neural Network. We must feed many different images with a label that would inform whether there is a cat or not in the image. After “looking” many figures, the NN will learn specific patterns that will be able to recognize whether there’s a cat in the figure.
Application Example: License Plate Recognition
One interesting application that can be performed using neural networks is license plate recognition. The idea is to use a sequence of neural networks in the images that have any type of vehicle to detect and recognize the plate of every car in the image. An example of this application is shown in the Figure below, where the neural network system was able to correctly detect all the characters of a Brazilian car plate.
In this application, it is necessary to use a neural network that will recognize each character (letters and numbers) of the plate. Therefore, considering the recognition of numbers, the neural network must be fed with many different images of numbers (0–9) to be able to recognize them.
Neural Network Architecture
A neural network can be represented as shown in the Figure below. Each circular node represents a neuron and an arrow represents a connection from the output of one neuron to the input of another. Each connection between neurons has a weight (real number), which determines the strength of one node’s influence on another.
A neural network has different layers, which is defined as a collection of nodes operating together at a specific depth within a neural network. There are three types of layers:
Input layer: contains the raw data. Each node of this layer can represent one pixel value of the image that we want to recognize an object.
Hidden layer: responsible for processing the input data. The neurons from the hidden layer intervene between the input and output data. Each layer may be responsible for detecting one feature of the object.
Output layer: shows the final value for the user. Usually, it is the probability of existing a specific object in the image.
The size of the input and output layer depends on the application. Consider the number recognition application previously described. If each digit image is 28x28 pixels, the corresponding input layer must have 28 * 28 = 784 nodes. Besides, since there are 10 possible values for the image (numbers from 0 to 9), the output layer should have 10 nodes.
The GIF below shows an example of this configuration, where an image of the number two is used as input. In this case, the probability of this image to be the number two is higher than all the other numbers, so the neural network returns that this image represents the number 2.
Currently, there is no correct way to choose the architecture of a neural network (number of nodes, hidden layer). Usually, the developer tries a different configuration, and the one with the best accuracy is selected as the final model.
Single Neuron
The operations performed by one single neuron are straighforward and can be summarized in the Figure below.
First, it sums the multiplication of all the inputs that are connected to this neuron by weights (w1, w2, …, wn), such that the first input x1 is multiplied by the first weight w1, the second input x2 is multiplied by the second weight w2, and so on. Each connection of neurons has its weight, and those are the only values that will be modified during the learning process.
Then, a bias value b may be added to the total value calculated, which is a real number related to each neuron individually. After all those summations, the neuron finally applies a function f called “activation function” to the obtained value.
This activation function is non-linear and has the purpose to add non-linearity into the output of a neuron. This is important because most real-world data is non-linear and we want neurons to learn these non-linear representations.
There are different types of activation functions that may be used. One of the most common is the sigmoid function, shown in the Figure to the left. This particular function will limit the output value of a neuron between 0 and 1.
Therefore, a neuron simply takes all the input values, multiply by their respective weight and add them, add a bias value, and apply an activation function. Then, this final value is used as input for the next layer.
Training process
When we start the training process of a neural network, all the weights and biases are randomly selected. Besides, as previously said, we will continuously feed the neural network with input images of what we want the NN to predict and its respective label. So, taking the example of number recognition, we would not only feed images of different numbers, but we would also have to inform what number contains in the image.
On the first try, the neural network probably will not get right the class of the image, since all the weights were randomly selected; and that’s why every training example comes with the label so that the NN can figure out what class it should have guessed. If the NN correctly label the image, the current parameters (weights and bias) are kept the same. On the other hand, if the output does not match the label, the parameters will be updated to try to get the correct result in the next iteration. This process is performed many times for different input images so that the NN can learn features of each class.
To determine which and how the weights and bias of each neuron will change, a process called backward propagation is performed, which basically consists of going back on the neural network and inspect every connection to check how the output would behave according to a change in the weight. In this article, we will not be entering into many mathematical details related to backward propagation. However, if you are interested in a deeper study related to this process, you can read the article of Prakash Jay, which explains with more mathematical details how backward propagation works.
Conclusion
In this article, we provided a simple explanation of neural networks, where we focused on explaining the definition of neural networks, its architecture, how a single neuron works, and the training process.
We also provided an interesting application that uses a sequence of neural networks: license plate recognition. This application detects and recognizes the plate of every car in an image, and can be used for several purposes, such as parking automation and parking security, access control, motorway road tolling, and so on.
Thank you, and I hope to see you in the next article!