Adaptive Resonance Theory This network was developed by Stephen Grossberg and Gail Carpenter in 1987. It is based on competition and uses unsupervised learning model. Adaptive Resonance Theory (ART) networks, as the name suggests, is always open to new learning (adaptive) without losing the old patterns (resonance). Basically, ART network is a vector classifier which accepts an input vector and classifies it into one of the categories depending upon which of the stored pattern it resembles the most. Operating Principal The main operation of ART classification can be divided into the following phases − Recognition phase − The input vector is compared with the classification presented at every node in the output layer. The output of the neuron becomes “1” if it best matches with the classification applied, otherwise it becomes “0”. Comparison phase − In this phase, a comparison of the input vector to the comparison layer vector is done. The condition for reset is that the degree of similarity would be less than vigilance parameter. Search phase − In this phase, the network will search for reset as well as the match done in the above phases. Hence, if there would be no reset and the match is quite good, then the classification is over. Otherwise, the process would be repeated and the other stored pattern must be sent to find the correct match. ART1 It is a type of ART, which is designed to cluster binary vectors. We can understand about this with the architecture of it. Architecture of ART1 It consists of the following two units − Computational Unit − It is made up of the following − Input unit (F1 layer) − It further has the following two portions − F1(a) layer (Input portion) − In ART1, there would be no processing in this portion rather than having the input vectors only. It is connected to F1(b) layer (interface portion). F1(b) layer (Interface portion) − This portion combines the signal from the input portion with that of F2 layer. F1(b) layer is connected to F2 layer through bottom up weights bij and F2 layer is connected to F1(b) layer through top down weights tji. Cluster Unit (F2 layer) − This is a competitive layer. The unit having the largest net input is selected to learn the input pattern. The activation of all other cluster unit are set to 0. Reset Mechanism − The work of this mechanism is based upon the similarity between the top-down weight and the input vector. Now, if the degree of this similarity is less than the vigilance parameter, then the cluster is not allowed to learn the pattern and a rest would happen. Supplement Unit − Actually the issue with Reset mechanism is that the layer F2 must have to be inhibited under certain conditions and must also be available when some learning happens. That is why two supplemental units namely, G1 and G2 is added along with reset unit, R. They are called gain control units. These units receive and send signals to the other units present in the network. ‘+’ indicates an excitatory signal, while ‘−’ indicates an inhibitory signal. Parameters Used Following parameters are used − n − Number of components in the input vector m − Maximum number of clusters that can be formed bij − Weight from F1(b) to F2 layer, i.e. bottom-up weights tji − Weight from F2 to F1(b) layer, i.e. top-down weights ρ − Vigilance parameter ||x|| − Norm of vector x Algorithm Step 1 − Initialize the learning rate, the vigilance parameter, and the weights as follows − $$alpha:>:1::and::0: $$0: Step 2 − Continue step 3-9, when the stopping condition is not true. Step 3 − Continue step 4-6 for every training input. Step 4 − Set activations of all F1(a) and F1 units as follows F2 = 0 and F1(a) = input vectors Step 5 − Input signal from F1(a) to F1(b) layer must be sent like $$s_{i}:=:x_{i}$$ Step 6 − For every inhibited F2 node $y_{j}:=:sum_i b_{ij}x_{i}$ the condition is yj ≠ -1 Step 7 − Perform step 8-10, when the reset is true. Step 8 − Find J for yJ ≥ yj for all nodes j Step 9 − Again calculate the activation on F1(b) as follows $$x_{i}:=:sitJi$$ Step 10 − Now, after calculating the norm of vector x and vector s, we need to check the reset condition as follows − If ||x||/ ||s|| < vigilance parameter ρ,theninhibit node J and go to step 7 Else If ||x||/ ||s|| ≥ vigilance parameter ρ, then proceed further. Step 11 − Weight updating for node J can be done as follows − $$b_{ij}(new):=:frac{alpha x_{i}}{alpha:-:1:+:||x||}$$ $$t_{ij}(new):=:x_{i}$$ Step 12 − The stopping condition for algorithm must be checked and it may be as follows − Do not have any change in weight. Reset is not performed for units. Maximum number of epochs reached. Learning working make money
Category: artificial Neural Network
Artificial Neural Network – Hopfield Networks Hopfield neural network was invented by Dr. John J. Hopfield in 1982. It consists of a single layer which contains one or more fully connected recurrent neurons. The Hopfield network is commonly used for auto-association and optimization tasks. Discrete Hopfield Network A Hopfield network which operates in a discrete line fashion or in other words, it can be said the input and output patterns are discrete vector, which can be either binary (0,1) or bipolar (+1, -1) in nature. The network has symmetrical weights with no self-connections i.e., wij = wji and wii = 0. Architecture Following are some important points to keep in mind about discrete Hopfield network − This model consists of neurons with one inverting and one non-inverting output. The output of each neuron should be the input of other neurons but not the input of self. Weight/connection strength is represented by wij. Connections can be excitatory as well as inhibitory. It would be excitatory, if the output of the neuron is same as the input, otherwise inhibitory. Weights should be symmetrical, i.e. wij = wji The output from Y1 going to Y2, Yi and Yn have the weights w12, w1i and w1n respectively. Similarly, other arcs have the weights on them. Training Algorithm During training of discrete Hopfield network, weights will be updated. As we know that we can have the binary input vectors as well as bipolar input vectors. Hence, in both the cases, weight updates can be done with the following relation Case 1 − Binary input patterns For a set of binary patterns s(p), p = 1 to P Here, s(p) = s1(p), s2(p),…, si(p),…, sn(p) Weight Matrix is given by $$w_{ij}:=:sum_{p=1}^P[2s_{i}(p)-:1][2s_{j}(p)-:1]:::::for:i:neq:j$$ Case 2 − Bipolar input patterns For a set of binary patterns s(p), p = 1 to P Here, s(p) = s1(p), s2(p),…, si(p),…, sn(p) Weight Matrix is given by $$w_{ij}:=:sum_{p=1}^P[s_{i}(p)][s_{j}(p)]:::::for:i:neq:j$$ Testing Algorithm Step 1 − Initialize the weights, which are obtained from training algorithm by using Hebbian principle. Step 2 − Perform steps 3-9, if the activations of the network is not consolidated. Step 3 − For each input vector X, perform steps 4-8. Step 4 − Make initial activation of the network equal to the external input vector X as follows − $$y_{i}:=:x_{i}:::for:i:=:1:to:n$$ Step 5 − For each unit Yi, perform steps 6-9. Step 6 − Calculate the net input of the network as follows − $$y_{ini}:=:x_{i}:+:displaystylesumlimits_{j}y_{j}w_{ji}$$ Step 7 − Apply the activation as follows over the net input to calculate the output − $$y_{i}:=begin{cases}1 & if:y_{ini}:>:theta_{i}\y_{i} & if:y_{ini}:=:theta_{i}\0 & if:y_{ini}: Here $theta_{i}$ is the threshold. Step 8 − Broadcast this output yi to all other units. Step 9 − Test the network for conjunction. Energy Function Evaluation An energy function is defined as a function that is bonded and non-increasing function of the state of the system. Energy function Ef, also called Lyapunov function determines the stability of discrete Hopfield network, and is characterized as follows − $$E_{f}:=:-frac{1}{2}displaystylesumlimits_{i=1}^ndisplaystylesumlimits_{j=1}^n y_{i}y_{j}w_{ij}:-:displaystylesumlimits_{i=1}^n x_{i}y_{i}:+:displaystylesumlimits_{i=1}^n theta_{i}y_{i}$$ Condition − In a stable network, whenever the state of node changes, the above energy function will decrease. Suppose when node i has changed state from $y_i^{(k)}$ to $y_i^{(k:+:1)}$ then the Energy change $Delta E_{f}$ is given by the following relation $$Delta E_{f}:=:E_{f}(y_i^{(k+1)}):-:E_{f}(y_i^{(k)})$$ $$=:-left(begin{array}{c}displaystylesumlimits_{j=1}^n w_{ij}y_i^{(k)}:+:x_{i}:-:theta_{i}end{array}right)(y_i^{(k+1)}:-:y_i^{(k)})$$ $$=:-:(net_{i})Delta y_{i}$$ Here $Delta y_{i}:=:y_i^{(k:+:1)}:-:y_i^{(k)}$ The change in energy depends on the fact that only one unit can update its activation at a time. Continuous Hopfield Network In comparison with Discrete Hopfield network, continuous network has time as a continuous variable. It is also used in auto association and optimization problems such as travelling salesman problem. Model − The model or architecture can be build up by adding electrical components such as amplifiers which can map the input voltage to the output voltage over a sigmoid activation function. Energy Function Evaluation $$E_f = frac{1}{2}displaystylesumlimits_{i=1}^nsum_{substack{j = 1\ j ne i}}^n y_i y_j w_{ij} – displaystylesumlimits_{i=1}^n x_i y_i + frac{1}{lambda} displaystylesumlimits_{i=1}^n sum_{substack{j = 1\ j ne i}}^n w_{ij} g_{ri} int_{0}^{y_i} a^{-1}(y) dy$$ Here λ is gain parameter and gri input conductance. Learning working make money
Associate Memory Network These kinds of neural networks work on the basis of pattern association, which means they can store different patterns and at the time of giving an output they can produce one of the stored patterns by matching them with the given input pattern. These types of memories are also called Content-Addressable Memory (CAM). Associative memory makes a parallel search with the stored patterns as data files. Following are the two types of associative memories we can observe − Auto Associative Memory Hetero Associative memory Auto Associative Memory This is a single layer neural network in which the input training vector and the output target vectors are the same. The weights are determined so that the network stores a set of patterns. Architecture As shown in the following figure, the architecture of Auto Associative memory network has ‘n’ number of input training vectors and similar ‘n’ number of output target vectors. Training Algorithm For training, this network is using the Hebb or Delta learning rule. Step 1 − Initialize all the weights to zero as wij = 0 (i = 1 to n, j = 1 to n) Step 2 − Perform steps 3-4 for each input vector. Step 3 − Activate each input unit as follows − $$x_{i}:=:s_{i}:(i:=:1:to:n)$$ Step 4 − Activate each output unit as follows − $$y_{j}:=:s_{j}:(j:=:1:to:n)$$ Step 5 − Adjust the weights as follows − $$w_{ij}(new):=:w_{ij}(old):+:x_{i}y_{j}$$ Testing Algorithm Step 1 − Set the weights obtained during training for Hebb’s rule. Step 2 − Perform steps 3-5 for each input vector. Step 3 − Set the activation of the input units equal to that of the input vector. Step 4 − Calculate the net input to each output unit j = 1 to n $$y_{inj}:=:displaystylesumlimits_{i=1}^n x_{i}w_{ij}$$ Step 5 − Apply the following activation function to calculate the output $$y_{j}:=:f(y_{inj}):=:begin{cases}+1 & if:y_{inj}:>:0\-1 & if:y_{inj}:leqslant:0end{cases}$$ Hetero Associative memory Similar to Auto Associative Memory network, this is also a single layer neural network. However, in this network the input training vector and the output target vectors are not the same. The weights are determined so that the network stores a set of patterns. Hetero associative network is static in nature, hence, there would be no non-linear and delay operations. Architecture As shown in the following figure, the architecture of Hetero Associative Memory network has ‘n’ number of input training vectors and ‘m’ number of output target vectors. Training Algorithm For training, this network is using the Hebb or Delta learning rule. Step 1 − Initialize all the weights to zero as wij = 0 (i = 1 to n, j = 1 to m) Step 2 − Perform steps 3-4 for each input vector. Step 3 − Activate each input unit as follows − $$x_{i}:=:s_{i}:(i:=:1:to:n)$$ Step 4 − Activate each output unit as follows − $$y_{j}:=:s_{j}:(j:=:1:to:m)$$ Step 5 − Adjust the weights as follows − $$w_{ij}(new):=:w_{ij}(old):+:x_{i}y_{j}$$ Testing Algorithm Step 1 − Set the weights obtained during training for Hebb’s rule. Step 2 − Perform steps 3-5 for each input vector. Step 3 − Set the activation of the input units equal to that of the input vector. Step 4 − Calculate the net input to each output unit j = 1 to m; $$y_{inj}:=:displaystylesumlimits_{i=1}^n x_{i}w_{ij}$$ Step 5 − Apply the following activation function to calculate the output $$y_{j}:=:f(y_{inj}):=:begin{cases}+1 & if:y_{inj}:>:0\0 & if:y_{inj}:=:0\-1 & if:y_{inj}: Learning working make money
Brain-State-in-a-Box Network The Brain-State-in-a-Box (BSB) neural network is a nonlinear auto-associative neural network and can be extended to hetero-association with two or more layers. It is also similar to Hopfield network. It was proposed by J.A. Anderson, J.W. Silverstein, S.A. Ritz and R.S. Jones in 1977. Some important points to remember about BSB Network − It is a fully connected network with the maximum number of nodes depending upon the dimensionality n of the input space. All the neurons are updated simultaneously. Neurons take values between -1 to +1. Mathematical Formulations The node function used in BSB network is a ramp function, which can be defined as follows − $$f(net):=:min(1,:max(-1,:net))$$ This ramp function is bounded and continuous. As we know that each node would change its state, it can be done with the help of the following mathematical relation − $$x_{t}(t:+:1):=:fleft(begin{array}{c}displaystylesumlimits_{j=1}^n w_{i,j}x_{j}(t)end{array}right)$$ Here, xi(t) is the state of the ith node at time t. Weights from ith node to jth node can be measured with the following relation − $$w_{ij}:=:frac{1}{P}displaystylesumlimits_{p=1}^P (v_{p,i}:v_{p,j})$$ Here, P is the number of training patterns, which are bipolar. Learning working make money
Artificial Neural Network – Basic Concepts Neural networks are parallel computing devices, which is basically an attempt to make a computer model of the brain. The main objective is to develop a system to perform various computational tasks faster than the traditional systems. These tasks include pattern recognition and classification, approximation, optimization, and data clustering. What is Artificial Neural Network? Artificial Neural Network (ANN) is an efficient computing system whose central theme is borrowed from the analogy of biological neural networks. ANNs are also named as “artificial neural systems,” or “parallel distributed processing systems,” or “connectionist systems.” ANN acquires a large collection of units that are interconnected in some pattern to allow communication between the units. These units, also referred to as nodes or neurons, are simple processors which operate in parallel. Every neuron is connected with other neuron through a connection link. Each connection link is associated with a weight that has information about the input signal. This is the most useful information for neurons to solve a particular problem because the weight usually excites or inhibits the signal that is being communicated. Each neuron has an internal state, which is called an activation signal. Output signals, which are produced after combining the input signals and activation rule, may be sent to other units. A Brief History of ANN The history of ANN can be divided into the following three eras − ANN during 1940s to 1960s Some key developments of this era are as follows − 1943 − It has been assumed that the concept of neural network started with the work of physiologist, Warren McCulloch, and mathematician, Walter Pitts, when in 1943 they modeled a simple neural network using electrical circuits in order to describe how neurons in the brain might work. 1949 − Donald Hebb’s book, The Organization of Behavior, put forth the fact that repeated activation of one neuron by another increases its strength each time they are used. 1956 − An associative memory network was introduced by Taylor. 1958 − A learning method for McCulloch and Pitts neuron model named Perceptron was invented by Rosenblatt. 1960 − Bernard Widrow and Marcian Hoff developed models called “ADALINE” and “MADALINE.” ANN during 1960s to 1980s Some key developments of this era are as follows − 1961 − Rosenblatt made an unsuccessful attempt but proposed the “backpropagation” scheme for multilayer networks. 1964 − Taylor constructed a winner-take-all circuit with inhibitions among output units. 1969 − Multilayer perceptron (MLP) was invented by Minsky and Papert. 1971 − Kohonen developed Associative memories. 1976 − Stephen Grossberg and Gail Carpenter developed Adaptive resonance theory. ANN from 1980s till Present Some key developments of this era are as follows − 1982 − The major development was Hopfield’s Energy approach. 1985 − Boltzmann machine was developed by Ackley, Hinton, and Sejnowski. 1986 − Rumelhart, Hinton, and Williams introduced Generalised Delta Rule. 1988 − Kosko developed Binary Associative Memory (BAM) and also gave the concept of Fuzzy Logic in ANN. The historical review shows that significant progress has been made in this field. Neural network based chips are emerging and applications to complex problems are being developed. Surely, today is a period of transition for neural network technology. Biological Neuron A nerve cell (neuron) is a special biological cell that processes information. According to an estimation, there are huge number of neurons, approximately 1011 with numerous interconnections, approximately 1015. Schematic Diagram Working of a Biological Neuron As shown in the above diagram, a typical neuron consists of the following four parts with the help of which we can explain its working − Dendrites − They are tree-like branches, responsible for receiving the information from other neurons it is connected to. In other sense, we can say that they are like the ears of neuron. Soma − It is the cell body of the neuron and is responsible for processing of information, they have received from dendrites. Axon − It is just like a cable through which neurons send the information. Synapses − It is the connection between the axon and other neuron dendrites. ANN versus BNN Before taking a look at the differences between Artificial Neural Network (ANN) and Biological Neural Network (BNN), let us take a look at the similarities based on the terminology between these two. Biological Neural Network (BNN) Artificial Neural Network (ANN) Soma Node Dendrites Input Synapse Weights or Interconnections Axon Output The following table shows the comparison between ANN and BNN based on some criteria mentioned. Criteria BNN ANN Processing Massively parallel, slow but superior than ANN Massively parallel, fast but inferior than BNN Size 1011 neurons and 1015 interconnections 102 to 104 nodes (mainly depends on the type of application and network designer) Learning They can tolerate ambiguity Very precise, structured and formatted data is required to tolerate ambiguity Fault tolerance Performance degrades with even partial damage It is capable of robust performance, hence has the potential to be fault tolerant Storage capacity Stores the information in the synapse Stores the information in continuous memory locations Model of Artificial Neural Network The following diagram represents the general model of ANN followed by its processing. For the above general model of artificial neural network, the net input can be calculated as follows − $$y_{in}:=:x_{1}.w_{1}:+:x_{2}.w_{2}:+:x_{3}.w_{3}:dotso: x_{m}.w_{m}$$ i.e., Net input $y_{in}:=:sum_i^m:x_{i}.w_{i}$ The output can be calculated by applying the activation function over the net input. $$Y:=:F(y_{in}) $$ Output = function (net input calculated) Learning working make money
Boltzmann Machine These are stochastic learning processes having recurrent structure and are the basis of the early optimization techniques used in ANN. Boltzmann Machine was invented by Geoffrey Hinton and Terry Sejnowski in 1985. More clarity can be observed in the words of Hinton on Boltzmann Machine. “A surprising feature of this network is that it uses only locally available information. The change of weight depends only on the behavior of the two units it connects, even though the change optimizes a global measure” – Ackley, Hinton 1985. Some important points about Boltzmann Machine − They use recurrent structure. They consist of stochastic neurons, which have one of the two possible states, either 1 or 0. Some of the neurons in this are adaptive (free state) and some are clamped (frozen state). If we apply simulated annealing on discrete Hopfield network, then it would become Boltzmann Machine. Objective of Boltzmann Machine The main purpose of Boltzmann Machine is to optimize the solution of a problem. It is the work of Boltzmann Machine to optimize the weights and quantity related to that particular problem. Architecture The following diagram shows the architecture of Boltzmann machine. It is clear from the diagram, that it is a two-dimensional array of units. Here, weights on interconnections between units are –p where p > 0. The weights of self-connections are given by b where b > 0. Training Algorithm As we know that Boltzmann machines have fixed weights, hence there will be no training algorithm as we do not need to update the weights in the network. However, to test the network we have to set the weights as well as to find the consensus function (CF). Boltzmann machine has a set of units Ui and Uj and has bi-directional connections on them. We are considering the fixed weight say wij. wij ≠ 0 if Ui and Uj are connected. There also exists a symmetry in weighted interconnection, i.e. wij = wji. wii also exists, i.e. there would be the self-connection between units. For any unit Ui, its state ui would be either 1 or 0. The main objective of Boltzmann Machine is to maximize the Consensus Function (CF) which can be given by the following relation $$CF:=:displaystylesumlimits_{i} displaystylesumlimits_{jleqslant i} w_{ij}u_{i}u_{j}$$ Now, when the state changes from either 1 to 0 or from 0 to 1, then the change in consensus can be given by the following relation − $$Delta CF:=:(1:-:2u_{i})(w_{ij}:+:displaystylesumlimits_{jneq i} u_{i} w_{ij})$$ Here ui is the current state of Ui. The variation in coefficient (1 – 2ui) is given by the following relation − $$(1:-:2u_{i}):=:begin{cases}+1, & U_{i}:is:currently:off\-1, & U_{i}:is:currently:onend{cases}$$ Generally, unit Ui does not change its state, but if it does then the information would be residing local to the unit. With that change, there would also be an increase in the consensus of the network. Probability of the network to accept the change in the state of the unit is given by the following relation − $$AF(i,T):=:frac{1}{1:+:exp[-frac{Delta CF(i)}{T}]}$$ Here, T is the controlling parameter. It will decrease as CF reaches the maximum value. Testing Algorithm Step 1 − Initialize the following to start the training − Weights representing the constraint of the problem Control Parameter T Step 2 − Continue steps 3-8, when the stopping condition is not true. Step 3 − Perform steps 4-7. Step 4 − Assume that one of the state has changed the weight and choose the integer I, J as random values between 1 and n. Step 5 − Calculate the change in consensus as follows − $$Delta CF:=:(1:-:2u_{i})(w_{ij}:+:displaystylesumlimits_{jneq i} u_{i} w_{ij})$$ Step 6 − Calculate the probability that this network would accept the change in state $$AF(i,T):=:frac{1}{1:+:exp[-frac{Delta CF(i)}{T}]}$$ Step 7 − Accept or reject this change as follows − Case I − if R < AF, accept the change. Case II − if R ≥ AF, reject the change. Here, R is the random number between 0 and 1. Step 8 − Reduce the control parameter (temperature) as follows − T(new) = 0.95T(old) Step 9 − Test for the stopping conditions which may be as follows − Temperature reaches a specified value There is no change in state for a specified number of iterations Learning working make money
Artificial Neural Network – Building Blocks Processing of ANN depends upon the following three building blocks − Network Topology Adjustments of Weights or Learning Activation Functions In this chapter, we will discuss in detail about these three building blocks of ANN Network Topology A network topology is the arrangement of a network along with its nodes and connecting lines. According to the topology, ANN can be classified as the following kinds − Feedforward Network It is a non-recurrent network having processing units/nodes in layers and all the nodes in a layer are connected with the nodes of the previous layers. The connection has different weights upon them. There is no feedback loop means the signal can only flow in one direction, from input to output. It may be divided into the following two types − Single layer feedforward network − The concept is of feedforward ANN having only one weighted layer. In other words, we can say the input layer is fully connected to the output layer. Multilayer feedforward network − The concept is of feedforward ANN having more than one weighted layer. As this network has one or more layers between the input and the output layer, it is called hidden layers. Feedback Network As the name suggests, a feedback network has feedback paths, which means the signal can flow in both directions using loops. This makes it a non-linear dynamic system, which changes continuously until it reaches a state of equilibrium. It may be divided into the following types − Recurrent networks − They are feedback networks with closed loops. Following are the two types of recurrent networks. Fully recurrent network − It is the simplest neural network architecture because all nodes are connected to all other nodes and each node works as both input and output. Jordan network − It is a closed loop network in which the output will go to the input again as feedback as shown in the following diagram. Adjustments of Weights or Learning Learning, in artificial neural network, is the method of modifying the weights of connections between the neurons of a specified network. Learning in ANN can be classified into three categories namely supervised learning, unsupervised learning, and reinforcement learning. Supervised Learning As the name suggests, this type of learning is done under the supervision of a teacher. This learning process is dependent. During the training of ANN under supervised learning, the input vector is presented to the network, which will give an output vector. This output vector is compared with the desired output vector. An error signal is generated, if there is a difference between the actual output and the desired output vector. On the basis of this error signal, the weights are adjusted until the actual output is matched with the desired output. Unsupervised Learning As the name suggests, this type of learning is done without the supervision of a teacher. This learning process is independent. During the training of ANN under unsupervised learning, the input vectors of similar type are combined to form clusters. When a new input pattern is applied, then the neural network gives an output response indicating the class to which the input pattern belongs. There is no feedback from the environment as to what should be the desired output and if it is correct or incorrect. Hence, in this type of learning, the network itself must discover the patterns and features from the input data, and the relation for the input data over the output. Reinforcement Learning As the name suggests, this type of learning is used to reinforce or strengthen the network over some critic information. This learning process is similar to supervised learning, however we might have very less information. During the training of network under reinforcement learning, the network receives some feedback from the environment. This makes it somewhat similar to supervised learning. However, the feedback obtained here is evaluative not instructive, which means there is no teacher as in supervised learning. After receiving the feedback, the network performs adjustments of the weights to get better critic information in future. Activation Functions It may be defined as the extra force or effort applied over the input to obtain an exact output. In ANN, we can also apply activation functions over the input to get the exact output. Followings are some activation functions of interest − Linear Activation Function It is also called the identity function as it performs no input editing. It can be defined as − $$F(x):=:x$$ Sigmoid Activation Function It is of two type as follows − Binary sigmoidal function − This activation function performs input editing between 0 and 1. It is positive in nature. It is always bounded, which means its output cannot be less than 0 and more than 1. It is also strictly increasing in nature, which means more the input higher would be the output. It can be defined as $$F(x):=:sigm(x):=:frac{1}{1:+:exp(-x)}$$ Bipolar sigmoidal function − This activation function performs input editing between -1 and 1. It can be positive or negative in nature. It is always bounded, which means its output cannot be less than -1 and more than 1. It is also strictly increasing in nature like sigmoid function. It can be defined as $$F(x):=:sigm(x):=:frac{2}{1:+:exp(-x)}:-:1:=:frac{1:-:exp(x)}{1:+:exp(x)}$$ Learning working make money
Learning and Adaptation As stated earlier, ANN is completely inspired by the way biological nervous system, i.e. the human brain works. The most impressive characteristic of the human brain is to learn, hence the same feature is acquired by ANN. What Is Learning in ANN? Basically, learning means to do and adapt the change in itself as and when there is a change in environment. ANN is a complex system or more precisely we can say that it is a complex adaptive system, which can change its internal structure based on the information passing through it. Why Is It important? Being a complex adaptive system, learning in ANN implies that a processing unit is capable of changing its input/output behavior due to the change in environment. The importance of learning in ANN increases because of the fixed activation function as well as the input/output vector, when a particular network is constructed. Now to change the input/output behavior, we need to adjust the weights. Classification It may be defined as the process of learning to distinguish the data of samples into different classes by finding common features between the samples of the same classes. For example, to perform training of ANN, we have some training samples with unique features, and to perform its testing we have some testing samples with other unique features. Classification is an example of supervised learning. Neural Network Learning Rules We know that, during ANN learning, to change the input/output behavior, we need to adjust the weights. Hence, a method is required with the help of which the weights can be modified. These methods are called Learning rules, which are simply algorithms or equations. Following are some learning rules for the neural network − Hebbian Learning Rule This rule, one of the oldest and simplest, was introduced by Donald Hebb in his book The Organization of Behavior in 1949. It is a kind of feed-forward, unsupervised learning. Basic Concept − This rule is based on a proposal given by Hebb, who wrote − “When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased.” From the above postulate, we can conclude that the connections between two neurons might be strengthened if the neurons fire at the same time and might weaken if they fire at different times. Mathematical Formulation − According to Hebbian learning rule, following is the formula to increase the weight of connection at every time step. $$Delta w_{ji}(t):=:alpha x_{i}(t).y_{j}(t)$$ Here, $Delta w_{ji}(t)$ = increment by which the weight of connection increases at time step t $alpha$ = the positive and constant learning rate $x_{i}(t)$ = the input value from pre-synaptic neuron at time step t $y_{i}(t)$ = the output of pre-synaptic neuron at same time step t Perceptron Learning Rule This rule is an error correcting the supervised learning algorithm of single layer feedforward networks with linear activation function, introduced by Rosenblatt. Basic Concept − As being supervised in nature, to calculate the error, there would be a comparison between the desired/target output and the actual output. If there is any difference found, then a change must be made to the weights of connection. Mathematical Formulation − To explain its mathematical formulation, suppose we have ‘n’ number of finite input vectors, x(n), along with its desired/target output vector t(n), where n = 1 to N. Now the output ‘y’ can be calculated, as explained earlier on the basis of the net input, and activation function being applied over that net input can be expressed as follows − $$y:=:f(y_{in}):=:begin{cases}1, & y_{in}:>:theta \0, & y_{in}:leqslant:thetaend{cases}$$ Where θ is threshold. The updating of weight can be done in the following two cases − Case I − when t ≠ y, then $$w(new):=:w(old):+;tx$$ Case II − when t = y, then No change in weight Delta Learning Rule (Widrow-Hoff Rule) It is introduced by Bernard Widrow and Marcian Hoff, also called Least Mean Square (LMS) method, to minimize the error over all training patterns. It is kind of supervised learning algorithm with having continuous activation function. Basic Concept − The base of this rule is gradient-descent approach, which continues forever. Delta rule updates the synaptic weights so as to minimize the net input to the output unit and the target value. Mathematical Formulation − To update the synaptic weights, delta rule is given by $$Delta w_{i}:=:alpha:.x_{i}.e_{j}$$ Here $Delta w_{i}$ = weight change for ith pattern; $alpha$ = the positive and constant learning rate; $x_{i}$ = the input value from pre-synaptic neuron; $e_{j}$ = $(t:-:y_{in})$, the difference between the desired/target output and the actual output $y_{in}$ The above delta rule is for a single output unit only. The updating of weight can be done in the following two cases − Case-I − when t ≠ y, then $$w(new):=:w(old):+:Delta w$$ Case-II − when t = y, then No change in weight Competitive Learning Rule (Winner-takes-all) It is concerned with unsupervised training in which the output nodes try to compete with each other to represent the input pattern. To understand this learning rule, we must understand the competitive network which is given as follows − Basic Concept of Competitive Network − This network is just like a single layer feedforward network with feedback connection between outputs. The connections between outputs are inhibitory type, shown by dotted lines, which means the competitors never support themselves. Basic Concept of Competitive Learning Rule − As said earlier, there will be a competition among the output nodes. Hence, the main concept is that during training, the output unit with the highest activation to a given input pattern, will be declared the winner. This rule is also called Winner-takes-all because only the winning neuron is updated and the rest of the neurons are left unchanged. Mathematical formulation − Following are the three important factors for mathematical formulation
Supervised Learning As the name suggests, supervised learning takes place under the supervision of a teacher. This learning process is dependent. During the training of ANN under supervised learning, the input vector is presented to the network, which will produce an output vector. This output vector is compared with the desired/target output vector. An error signal is generated if there is a difference between the actual output and the desired/target output vector. On the basis of this error signal, the weights would be adjusted until the actual output is matched with the desired output. Perceptron Developed by Frank Rosenblatt by using McCulloch and Pitts model, perceptron is the basic operational unit of artificial neural networks. It employs supervised learning rule and is able to classify the data into two classes. Operational characteristics of the perceptron: It consists of a single neuron with an arbitrary number of inputs along with adjustable weights, but the output of the neuron is 1 or 0 depending upon the threshold. It also consists of a bias whose weight is always 1. Following figure gives a schematic representation of the perceptron. Perceptron thus has the following three basic elements − Links − It would have a set of connection links, which carries a weight including a bias always having weight 1. Adder − It adds the input after they are multiplied with their respective weights. Activation function − It limits the output of neuron. The most basic activation function is a Heaviside step function that has two possible outputs. This function returns 1, if the input is positive, and 0 for any negative input. Training Algorithm Perceptron network can be trained for single output unit as well as multiple output units. Training Algorithm for Single Output Unit Step 1 − Initialize the following to start the training − Weights Bias Learning rate $alpha$ For easy calculation and simplicity, weights and bias must be set equal to 0 and the learning rate must be set equal to 1. Step 2 − Continue step 3-8 when the stopping condition is not true. Step 3 − Continue step 4-6 for every training vector x. Step 4 − Activate each input unit as follows − $$x_{i}:=:s_{i}:(i:=:1:to:n)$$ Step 5 − Now obtain the net input with the following relation − $$y_{in}:=:b:+:displaystylesumlimits_{i}^n x_{i}.:w_{i}$$ Here ‘b’ is bias and ‘n’ is the total number of input neurons. Step 6 − Apply the following activation function to obtain the final output. $$f(y_{in}):=:begin{cases}1 & if:y_{in}:>:theta\0 & if : -theta:leqslant:y_{in}:leqslant:theta\-1 & if:y_{in}: Step 7 − Adjust the weight and bias as follows − Case 1 − if y ≠ t then, $$w_{i}(new):=:w_{i}(old):+:alpha:tx_{i}$$ $$b(new):=:b(old):+:alpha t$$ Case 2 − if y = t then, $$w_{i}(new):=:w_{i}(old)$$ $$b(new):=:b(old)$$ Here ‘y’ is the actual output and ‘t’ is the desired/target output. Step 8 − Test for the stopping condition, which would happen when there is no change in weight. Training Algorithm for Multiple Output Units The following diagram is the architecture of perceptron for multiple output classes. Step 1 − Initialize the following to start the training − Weights Bias Learning rate $alpha$ For easy calculation and simplicity, weights and bias must be set equal to 0 and the learning rate must be set equal to 1. Step 2 − Continue step 3-8 when the stopping condition is not true. Step 3 − Continue step 4-6 for every training vector x. Step 4 − Activate each input unit as follows − $$x_{i}:=:s_{i}:(i:=:1:to:n)$$ Step 5 − Obtain the net input with the following relation − $$y_{in}:=:b:+:displaystylesumlimits_{i}^n x_{i}:w_{ij}$$ Here ‘b’ is bias and ‘n’ is the total number of input neurons. Step 6 − Apply the following activation function to obtain the final output for each output unit j = 1 to m − $$f(y_{in}):=:begin{cases}1 & if:y_{inj}:>:theta\0 & if : -theta:leqslant:y_{inj}:leqslant:theta\-1 & if:y_{inj}: Step 7 − Adjust the weight and bias for x = 1 to n and j = 1 to m as follows − Case 1 − if yj ≠ tj then, $$w_{ij}(new):=:w_{ij}(old):+:alpha:t_{j}x_{i}$$ $$b_{j}(new):=:b_{j}(old):+:alpha t_{j}$$ Case 2 − if yj = tj then, $$w_{ij}(new):=:w_{ij}(old)$$ $$b_{j}(new):=:b_{j}(old)$$ Here ‘y’ is the actual output and ‘t’ is the desired/target output. Step 8 − Test for the stopping condition, which will happen when there is no change in weight. Adaptive Linear Neuron (Adaline) Adaline which stands for Adaptive Linear Neuron, is a network having a single linear unit. It was developed by Widrow and Hoff in 1960. Some important points about Adaline are as follows − It uses bipolar activation function. It uses delta rule for training to minimize the Mean-Squared Error (MSE) between the actual output and the desired/target output. The weights and the bias are adjustable. Architecture The basic structure of Adaline is similar to perceptron having an extra feedback loop with the help of which the actual output is compared with the desired/target output. After comparison on the basis of training algorithm, the weights and bias will be updated. Training Algorithm Step 1 − Initialize the following to start the training − Weights Bias Learning rate $alpha$ For easy calculation and simplicity, weights and bias must be set equal to 0 and the learning rate must be set equal to 1. Step 2 − Continue step 3-8 when the stopping condition is not true. Step 3 − Continue step 4-6 for every bipolar training pair s:t. Step 4 − Activate each input unit as follows − $$x_{i}:=:s_{i}:(i:=:1:to:n)$$ Step 5 − Obtain the net input with the following relation − $$y_{in}:=:b:+:displaystylesumlimits_{i}^n x_{i}:w_{i}$$ Here ‘b’ is bias and ‘n’ is the total number of input neurons. Step 6 − Apply the following activation function to obtain the final output − $$f(y_{in}):=:begin{cases}1 & if:y_{in}:geqslant:0 \-1 & if:y_{in}: Step 7 − Adjust the weight and bias as follows − Case 1 − if y ≠ t then, $$w_{i}(new):=:w_{i}(old):+: alpha(t:-:y_{in})x_{i}$$ $$b(new):=:b(old):+: alpha(t:-:y_{in})$$ Case 2 − if y = t then, $$w_{i}(new):=:w_{i}(old)$$ $$b(new):=:b(old)$$ Here ‘y’ is the actual output and ‘t’ is the desired/target output. $(t:-;y_{in})$ is the computed error. Step 8 − Test for the stopping
Artificial Neural Networks Tutorial Job Search Artificial Neural Networks are parallel computing devices, which are basically an attempt to make a computer model of the brain. The main objective is to develop a system to perform various computational tasks faster than the traditional systems. This tutorial covers the basic concept and terminologies involved in Artificial Neural Network. Sections of this tutorial also explain the architecture as well as the training algorithm of various networks used in ANN. Audience This tutorial will be useful for graduates, post graduates, and research students who either have an interest in this subject or have this subject as a part of their curriculum. The reader can be a beginner or an advanced learner. Prerequisites Artificial Neural Networks (ANN) is an advanced topic, hence the reader must have basic knowledge of Algorithms, Programming, and Mathematics. Learning working make money