Neural Network Part 1
In the previous post, a simple neural network was developed but you might have found jargons of the neural network like activation function, loss function and you would have said what the heck these words mean? This is what I will try to explain in this and next few articles.
A neuron is the fundamental part of any neural system, commonly known as the neural network (since its network of neurons, we will come back to it later.)A neuron can be considered as a basic entity which returns an output from inputs based on simpler functions embed in it. A neuron as a single entity provides very simplistic calculation, but when multiple neurons combined together, creates very complex systems like Human Brain.
Artificial Neurons are inspired by Human brain neurons. The human brain consists of a large number (more than a billion) of neural cells that process information. Each
cell works like a simple microprocessor. The massive connections between these neurons and their parallel working are what makes brain function so fabulous. A neuron consists of a core which performs calculation, dendrites for incoming information and an axon with dendrites for outgoing information that is passed to connected neurons.
Information is transported between neurons in form of electrical stimulations along the dendrites. Incoming information that reaches the neuron’s dendrites is added up and then delivered along the neuron’s axon to the dendrites at its end, where the information is passed to other neurons if the stimulation has exceeded a certain threshold. In this case, the neuron is said to be activated. If the incoming stimulation had been too low, the information will not be transported any further. In this case, the neuron is said to be inhibited. The connections between the neurons are adaptive, what means that the connection structure is changing dynamically. It is commonly acknowledged that the learning ability of the human brain is based on this adaptation.
Now, enough of the biology. The important question is how it works in the digital environment. In general sense, artificial neuron takes inputs, performs some manipulation over it and if the output is above the threshold, it gets fired that means it outputs in the positive side. The basic neuron is Perceptron.
Perceptron takes a number of binary inputs and returns a single binary output.
For example, in given image, we have a perceptron unit with 3 binary inputs x1, x2, x3 and their respective weights w1, w2 and w3 and a single output which can be either 0 or 1. This output is decided on the basis of the linear combination of these inputs with their weights. Hence,
Let’s take a step back and try to see what actually perceptron is doing. Consider that you have following points and you need to classify these points as shown in graph.
The best way to do this is the blue line which separates these points distinctively. Now as we know, equation of blue line is . The coefficients decide how much important and are for the point to be on the line. The intercept is boost given to point to be on the line. Let’s name these importance as weights and boost as bias (which indicates how bias line is for a point to be on line) so the equation becomes . Vectorising this equation, we get where W = [w1 w2] T and X = [x1 x2] T. So for any data point x(x1,x2), we calculate WX .If WX > 0 we mark output (ŷ) as 1 else 0. This is what exactly a perceptron does.
Since initially W and bias are chosen randomly, we might have number of data points wrongly classified. For such points y ≠ ŷ. It is necessary to move the line to make sure points are in right class. we can do that by taking each misclassified point and move line to accommodate changes necessary for that point. But while changing, we want to avoid sudden change in line as it may lead to more misclassified points. This can be achieved by multiplying feature values of point with some constant (also called learning rate) and then substracting them from coefficients of line (weights). With new weights, we figure out again how many misclassified points are and repeat the same process. This process can be done until either we get number of misclassified points below desired number or number of iterations (epochs) are over.
Perceptron can be considered as decision maker where if the sum is gone over the threshold, perceptron outputs 1 else 0. We can change model’s output by varying threshold value to make the decision more dependent on a particular input. For example if we provide weights as (W1, W2, W3) = (1, 2, 4) and inputs as (X1, X2, X3) = (1,1,1), then setting threshold at 4 will make sure perceptron will get trigger only when input X3 is 1 irrespective of other inputs, while if we set threshold at 6, it is necessary for output to be 1, X2 and X3 must be 1, irrespective of X1.
As we all know, Logic Gates are considered as a decision maker in the digital world. The underlying computation of these logic gates can be done using perceptrons. This is the simplest example of perceptron working and it will help us understand how perceptrons can be used. We will realize AND gate’s computation using perceptrons.
For AND gate, the table for output is:
Here, we have two inputs X1 and X2 and a single Output. Now as per above discussion, Output = where W1 and W2 are weights and b is a bias. So to realize AND gate, we need to find W1, W2, and b. One such tuple is (2,2,-3).
For output to be 1, both x1 and x2 must be 1. Inputing (2, 2 , -3) values in AND gate perceptron’s equation, we get output as greater than 0 only if both x1 and x2 are 1, thus realising AND gate. Youcan find python implementation of AND gate using perceptron here. You can test that gate by changing w1, w2 and b values in class AND.
In the upcoming post, we will discuss Perceptron further and their benefits and disadvantages.