Neural networks are a class of algorithms that are inspired by the structure and functions of the human brain, that is by attempting to mimic the way neurons (and their synapses) communicate and process information. Artificial Neural Networks have been used extensively in the field of artificial intelligence, and today are one of the most common machine learning algorithms used by data scientists and AI engineers for applications and tasks that require high levels of accuracy and speed, and that would otherwise be impossible for traditional computer architecture.
Let’s dive into more details…
Neural Networks and Human Brain
Neural Networks were popular in the 1980s and until the late 1990s, but raised back recently due to the increase in computational power of computers that are now able to support large-scale processing fast enough. Therefore Artificial Neural Networks are today the foundation of the most advanced AI applications and techniques.
As said, Neural networks were developed as simulating networks of neurons in the brain. But what does a single neuron look like in our brain?
We know the human brain is full of neuron cells, which have basically a body with a nucleus and a number of input wires called Dendrites that receive inputs from other locations. And the neuron also has an output wire called Axon which is used to send signals (messages) to other neurons.
Therefore, if we look at the neuron from a computational perspective, it is a node that obtains a number of data through the input wires, then applies some processing to the data, and delivers the outputs via the axon to other nodes in the network (or the brain).
Neurons communicate with each other by sending small pulses of electricity, or spikes, through their axons to other different neurons. The axon (output wire) of one neuron connects to the dendrites (input wire) of another neuron, which then accepts the incoming message that will be processed.
And then this neuron sends the processed message through its axon to other neurons, and this process is then propagated throughout the network. And this is the mechanism by which all human thoughts happen.
This is also how our senses and our muscles work: when we move one of our muscles, the neurons send the electricity to the muscle and that causes our muscles to contract. In an artificial neuron network implemented on the computer, we adopt a very simple model of what a neuron does, that is we model a neuron as a logistic unit as it is called in the algorithm language.
Therefore as a neuron performs some actions or calculations in our brains, in the math world, an artificial neuron represents a computational unit with a Sigmoid or logistic activation function as per the neural network terminology.
Artificial Neural Networks math modeling
We described that a neural network is just a group of different neurons connected all together. Here is a representation of a simple Neural Network with 3 different layers.
In a neural network, the first layer is also called the input layer, while the final layer is also called the output layer because that layer has one neuron that outputs the final value computed by a hypothesis function (called h). Layer 2 in between is called the hidden layer; it’s named hidden because, based on the supervised learning, the inputs (x) and the outputs (y) are visible while the layer in between is not observed in the training setup. It’s not x, and it’s not y, and so we call hidden those layers in the middle. In general, in a neural network, anything that isn’t an input layer and isn’t an output layer is called a hidden layer.
Starting from the Layers and Units (neurons), it’s possible to derive a mathematical representation of the neural network (which is beyond the scope of this article). What is worth understanding is that the above picture describes an artificial neural network that defines a function “h” that maps with x’s input values to some space that provisions y. And this hypothesis is parameterized by parameters, denoted with a capital theta (subscript of function h) so that, as we vary theta, we get different hypotheses and we get different functions.
Use of Neural Network
One way to use neural networks is to do multiclass classification where we want to process via machine learning some data (e.g. images to be automatically recognized, or words to be automatically read, etc.) that can be represented in more than one category of data (e.g. for images, categories of colors or size, for words, categories of letters, etc.) that we’re trying to distinguish in order to provide them as input for the neural network. The handwritten digit recognition is also an example of a multiclass classification problem because for the AI algorithm there are ten possible categories for recognizing the digits from 0 through 9.
The way multiclass classification is done in a neural network is essentially an extension of the one versus all method.
Let’s assume, as an example, the computer vision, where the goal is to recognize four categories of objects, and given a generic image we want to decide if it is a pedestrian, a car, a motorcycle, or a truck. In such a case, the appropriate neural network to build must have four output units, that is a vector of four numbers. And specifically, the first output unit is aimed to classify whether the image is a pedestrian or not; then the second unit to classify whether the image is a car (yes or no); the third unit to classify if the image is or is not a motorcycle, and the fourth unit classifies if the image is a truck or not. And thus, when the image is of a pedestrian, the output of the network is the vector “1, 0, 0, 0”; when it is a car the output is “0, 1, 0, 0”, when a motorcycle, output is 0, 0, 1, 0 and so on.
So this is the “one versus all” method where there are four logistic regression classifiers, each of which tries to recognize one of the four classes to distinguish amongst.
Finally here is the representation of the neural network with four output units and the vector function h(x) that changes based on the different images.