Kohonen Self-Organizing Feature Maps Suppose we have some pattern of arbitrary dimensions, however, we need them in one dimension or two dimensions. Then the process of feature mapping would be very useful to convert the wide pattern space into a typical feature space. Now, the question arises why do we require self-organizing feature map? The reason is, along with the capability to convert the arbitrary dimensions into 1-D or 2-D, it must also have the ability to preserve the neighbor topology. Neighbor Topologies in Kohonen SOM There can be various topologies, however the following two topologies are used the most − Rectangular Grid Topology This topology has 24 nodes in the distance-2 grid, 16 nodes in the distance-1 grid, and 8 nodes in the distance-0 grid, which means the difference between each rectangular grid is 8 nodes. The winning unit is indicated by #. Hexagonal Grid Topology This topology has 18 nodes in the distance-2 grid, 12 nodes in the distance-1 grid, and 6 nodes in the distance-0 grid, which means the difference between each rectangular grid is 6 nodes. The winning unit is indicated by #. Architecture The architecture of KSOM is similar to that of the competitive network. With the help of neighborhood schemes, discussed earlier, the training can take place over the extended region of the network. Algorithm for training Step 1 − Initialize the weights, the learning rate α and the neighborhood topological scheme. Step 2 − Continue step 3-9, when the stopping condition is not true. Step 3 − Continue step 4-6 for every input vector x. Step 4 − Calculate Square of Euclidean Distance for j = 1 to m $$D(j):=:displaystylesumlimits_{i=1}^n displaystylesumlimits_{j=1}^m (x_{i}:-:w_{ij})^2$$ Step 5 − Obtain the winning unit J where D(j) is minimum. Step 6 − Calculate the new weight of the winning unit by the following relation − $$w_{ij}(new):=:w_{ij}(old):+:alpha[x_{i}:-:w_{ij}(old)]$$ Step 7 − Update the learning rate α by the following relation − $$alpha(t:+:1):=:0.5alpha t$$ Step 8 − Reduce the radius of topological scheme. Step 9 − Check for the stopping condition for the network. Learning working make money
Category: artificial Neural Network
Other Optimization Techniques Iterated Gradient Descent Technique Gradient descent, also known as the steepest descent, is an iterative optimization algorithm to find a local minimum of a function. While minimizing the function, we are concerned with the cost or error to be minimized (Remember Travelling Salesman Problem). It is extensively used in deep learning, which is useful in a wide variety of situations. The point here to be remembered is that we are concerned with local optimization and not global optimization. Main Working Idea We can understand the main working idea of gradient descent with the help of the following steps − First, start with an initial guess of the solution. Then, take the gradient of the function at that point. Later, repeat the process by stepping the solution in the negative direction of the gradient. By following the above steps, the algorithm will eventually converge where the gradient is zero. Mathematical Concept Suppose we have a function f(x) and we are trying to find the minimum of this function. Following are the steps to find the minimum of f(x). First, give some initial value $x_{0}:for:x$ Now take the gradient $nabla f$ of function, with the intuition that the gradient will give the slope of the curve at that x and its direction will point to the increase in the function, to find out the best direction to minimize it. Now change x as follows − $$x_{n:+:1}:=:x_{n}:-:theta nabla f(x_{n})$$ Here, θ > 0 is the training rate (step size) that forces the algorithm to take small jumps. Estimating Step Size Actually a wrong step size θ may not reach convergence, hence a careful selection of the same is very important. Following points must have to be remembered while choosing the step size Do not choose too large step size, otherwise it will have a negative impact, i.e. it will diverge rather than converge. Do not choose too small step size, otherwise it take a lot of time to converge. Some options with regards to choosing the step size − One option is to choose a fixed step size. Another option is to choose a different step size for every iteration. Simulated Annealing The basic concept of Simulated Annealing (SA) is motivated by the annealing in solids. In the process of annealing, if we heat a metal above its melting point and cool it down then the structural properties will depend upon the rate of cooling. We can also say that SA simulates the metallurgy process of annealing. Use in ANN SA is a stochastic computational method, inspired by Annealing analogy, for approximating the global optimization of a given function. We can use SA to train feed-forward neural networks. Algorithm Step 1 − Generate a random solution. Step 2 − Calculate its cost using some cost function. Step 3 − Generate a random neighboring solution. Step 4 − Calculate the new solution cost by the same cost function. Step 5 − Compare the cost of a new solution with that of an old solution as follows − If CostNew Solution < CostOld Solution then move to the new solution. Step 6 − Test for the stopping condition, which may be the maximum number of iterations reached or get an acceptable solution. Learning working make money
Learning Vector Quantization Learning Vector Quantization (LVQ), different from Vector quantization (VQ) and Kohonen Self-Organizing Maps (KSOM), basically is a competitive network which uses supervised learning. We may define it as a process of classifying the patterns where each output unit represents a class. As it uses supervised learning, the network will be given a set of training patterns with known classification along with an initial distribution of the output class. After completing the training process, LVQ will classify an input vector by assigning it to the same class as that of the output unit. Architecture Following figure shows the architecture of LVQ which is quite similar to the architecture of KSOM. As we can see, there are “n” number of input units and “m” number of output units. The layers are fully interconnected with having weights on them. Parameters Used Following are the parameters used in LVQ training process as well as in the flowchart x = training vector (x1,…,xi,…,xn) T = class for training vector x wj = weight vector for jth output unit Cj = class associated with the jth output unit Training Algorithm Step 1 − Initialize reference vectors, which can be done as follows − Step 1(a) − From the given set of training vectors, take the first “m” (number of clusters) training vectors and use them as weight vectors. The remaining vectors can be used for training. Step 1(b) − Assign the initial weight and classification randomly. Step 1(c) − Apply K-means clustering method. Step 2 − Initialize reference vector $alpha$ Step 3 − Continue with steps 4-9, if the condition for stopping this algorithm is not met. Step 4 − Follow steps 5-6 for every training input vector x. Step 5 − Calculate Square of Euclidean Distance for j = 1 to m and i = 1 to n $$D(j):=:displaystylesumlimits_{i=1}^ndisplaystylesumlimits_{j=1}^m (x_{i}:-:w_{ij})^2$$ Step 6 − Obtain the winning unit J where D(j) is minimum. Step 7 − Calculate the new weight of the winning unit by the following relation − if T = Cj then $w_{j}(new):=:w_{j}(old):+:alpha[x:-:w_{j}(old)]$ if T ≠ Cj then $w_{j}(new):=:w_{j}(old):-:alpha[x:-:w_{j}(old)]$ Step 8 − Reduce the learning rate $alpha$. Step 9 − Test for the stopping condition. It may be as follows − Maximum number of epochs reached. Learning rate reduced to a negligible value. Flowchart Variants Three other variants namely LVQ2, LVQ2.1 and LVQ3 have been developed by Kohonen. Complexity in all these three variants, due to the concept that the winner as well as the runner-up unit will learn, is more than in LVQ. LVQ2 As discussed, the concept of other variants of LVQ above, the condition of LVQ2 is formed by window. This window will be based on the following parameters − x − the current input vector yc − the reference vector closest to x yr − the other reference vector, which is next closest to x dc − the distance from x to yc dr − the distance from x to yr The input vector x falls in the window, if $$frac{d_{c}}{d_{r}}:>:1:-:theta::and::frac{d_{r}}{d_{c}}:>:1:+:theta$$ Here, $theta$ is the number of training samples. Updating can be done with the following formula − $y_{c}(t:+:1):=:y_{c}(t):+:alpha(t)[x(t):-:y_{c}(t)]$ (belongs to different class) $y_{r}(t:+:1):=:y_{r}(t):+:alpha(t)[x(t):-:y_{r}(t)]$ (belongs to same class) Here $alpha$ is the learning rate. LVQ2.1 In LVQ2.1, we will take the two closest vectors namely yc1 and yc2 and the condition for window is as follows − $$Minbegin{bmatrix}frac{d_{c1}}{d_{c2}},frac{d_{c2}}{d_{c1}}end{bmatrix}:>:(1:-:theta)$$ $$Maxbegin{bmatrix}frac{d_{c1}}{d_{c2}},frac{d_{c2}}{d_{c1}}end{bmatrix}: Updating can be done with the following formula − $y_{c1}(t:+:1):=:y_{c1}(t):+:alpha(t)[x(t):-:y_{c1}(t)]$ (belongs to different class) $y_{c2}(t:+:1):=:y_{c2}(t):+:alpha(t)[x(t):-:y_{c2}(t)]$ (belongs to same class) Here, $alpha$ is the learning rate. LVQ3 In LVQ3, we will take the two closest vectors namely yc1 and yc2 and the condition for window is as follows − $$Minbegin{bmatrix}frac{d_{c1}}{d_{c2}},frac{d_{c2}}{d_{c1}}end{bmatrix}:>:(1:-:theta)(1:+:theta)$$ Here $thetaapprox 0.2$ Updating can be done with the following formula − $y_{c1}(t:+:1):=:y_{c1}(t):+:beta(t)[x(t):-:y_{c1}(t)]$ (belongs to different class) $y_{c2}(t:+:1):=:y_{c2}(t):+:beta(t)[x(t):-:y_{c2}(t)]$ (belongs to same class) Here $beta$ is the multiple of the learning rate $alpha$ and $beta:=:m alpha(t)$ for every 0.1 < m < 0.5 Learning working make money
Unsupervised Learning As the name suggests, this type of learning is done without the supervision of a teacher. This learning process is independent. During the training of ANN under unsupervised learning, the input vectors of similar type are combined to form clusters. When a new input pattern is applied, then the neural network gives an output response indicating the class to which input pattern belongs. In this, there would be no feedback from the environment as to what should be the desired output and whether it is correct or incorrect. Hence, in this type of learning the network itself must discover the patterns, features from the input data and the relation for the input data over the output. Winner-Takes-All Networks These kinds of networks are based on the competitive learning rule and will use the strategy where it chooses the neuron with the greatest total inputs as a winner. The connections between the output neurons show the competition between them and one of them would be ‘ON’ which means it would be the winner and others would be ‘OFF’. Following are some of the networks based on this simple concept using unsupervised learning. Hamming Network In most of the neural networks using unsupervised learning, it is essential to compute the distance and perform comparisons. This kind of network is Hamming network, where for every given input vectors, it would be clustered into different groups. Following are some important features of Hamming Networks − Lippmann started working on Hamming networks in 1987. It is a single layer network. The inputs can be either binary {0, 1} of bipolar {-1, 1}. The weights of the net are calculated by the exemplar vectors. It is a fixed weight network which means the weights would remain the same even during training. Max Net This is also a fixed weight network, which serves as a subnet for selecting the node having the highest input. All the nodes are fully interconnected and there exists symmetrical weights in all these weighted interconnections. Architecture It uses the mechanism which is an iterative process and each node receives inhibitory inputs from all other nodes through connections. The single node whose value is maximum would be active or winner and the activations of all other nodes would be inactive. Max Net uses identity activation function with $$f(x):=:begin{cases}x & if:x > 0\0 & if:x leq 0end{cases}$$ The task of this net is accomplished by the self-excitation weight of +1 and mutual inhibition magnitude, which is set like [0 < ɛ < $frac{1}{m}$] where “m” is the total number of the nodes. Competitive Learning in ANN It is concerned with unsupervised training in which the output nodes try to compete with each other to represent the input pattern. To understand this learning rule we will have to understand competitive net which is explained as follows − Basic Concept of Competitive Network This network is just like a single layer feed-forward network having feedback connection between the outputs. The connections between the outputs are inhibitory type, which is shown by dotted lines, which means the competitors never support themselves. Basic Concept of Competitive Learning Rule As said earlier, there would be competition among the output nodes so the main concept is – during training, the output unit that has the highest activation to a given input pattern, will be declared the winner. This rule is also called Winner-takes-all because only the winning neuron is updated and the rest of the neurons are left unchanged. Mathematical Formulation Following are the three important factors for mathematical formulation of this learning rule − Condition to be a winner Suppose if a neuron yk wants to be the winner, then there would be the following condition $$y_{k}:=:begin{cases}1 & if:v_{k} > v_{j}:for:all::j,:j:neq:k\0 & otherwiseend{cases}$$ It means that if any neuron, say, yk wants to win, then its induced local field (the output of the summation unit), say vk, must be the largest among all the other neurons in the network. Condition of the sum total of weight Another constraint over the competitive learning rule is the sum total of weights to a particular output neuron is going to be 1. For example, if we consider neuron k then $$displaystylesumlimits_{k} w_{kj}:=:1::::for:all::k$$ Change of weight for the winner If a neuron does not respond to the input pattern, then no learning takes place in that neuron. However, if a particular neuron wins, then the corresponding weights are adjusted as follows − $$Delta w_{kj}:=:begin{cases}-alpha(x_{j}:-:w_{kj}), & if:neuron:k:wins\0 & if:neuron:k:lossesend{cases}$$ Here $alpha$ is the learning rate. This clearly shows that we are favoring the winning neuron by adjusting its weight and if a neuron is lost, then we need not bother to re-adjust its weight. K-means Clustering Algorithm K-means is one of the most popular clustering algorithm in which we use the concept of partition procedure. We start with an initial partition and repeatedly move patterns from one cluster to another, until we get a satisfactory result. Algorithm Step 1 − Select k points as the initial centroids. Initialize k prototypes (w1,…,wk), for example we can identifying them with randomly chosen input vectors − $$W_{j}:=:i_{p},::: where:j:in lbrace1,….,krbrace:and:p:in lbrace1,….,nrbrace$$ Each cluster Cj is associated with prototype wj. Step 2 − Repeat step 3-5 until E no longer decreases, or the cluster membership no longer changes. Step 3 − For each input vector ip where p ∈ {1,…,n}, put ip in the cluster Cj* with the nearest prototype wj* having the following relation $$|i_{p}:-:w_{j*}|:leq:|i_{p}:-:w_{j}|,:j:in lbrace1,….,krbrace$$ Step 4 − For each cluster Cj, where j ∈ { 1,…,k}, update the prototype wj to be the centroid of all samples currently in Cj , so that $$w_{j}:=:sum_{i_{p}in C_{j}}frac{i_{p}}{|C_{j}|}$$ Step 5 − Compute the total quantization error as follows − $$E:=:sum_{j=1}^ksum_{i_{p}in w_{j}}|i_{p}:-:w_{j}|^2$$ Neocognitron It is a multilayer feedforward network, which was developed by Fukushima in 1980s. This model is based on supervised learning and is used for visual pattern recognition, mainly hand-written characters. It is basically an extension of Cognitron network, which was also developed by Fukushima
Adaptive Resonance Theory This network was developed by Stephen Grossberg and Gail Carpenter in 1987. It is based on competition and uses unsupervised learning model. Adaptive Resonance Theory (ART) networks, as the name suggests, is always open to new learning (adaptive) without losing the old patterns (resonance). Basically, ART network is a vector classifier which accepts an input vector and classifies it into one of the categories depending upon which of the stored pattern it resembles the most. Operating Principal The main operation of ART classification can be divided into the following phases − Recognition phase − The input vector is compared with the classification presented at every node in the output layer. The output of the neuron becomes “1” if it best matches with the classification applied, otherwise it becomes “0”. Comparison phase − In this phase, a comparison of the input vector to the comparison layer vector is done. The condition for reset is that the degree of similarity would be less than vigilance parameter. Search phase − In this phase, the network will search for reset as well as the match done in the above phases. Hence, if there would be no reset and the match is quite good, then the classification is over. Otherwise, the process would be repeated and the other stored pattern must be sent to find the correct match. ART1 It is a type of ART, which is designed to cluster binary vectors. We can understand about this with the architecture of it. Architecture of ART1 It consists of the following two units − Computational Unit − It is made up of the following − Input unit (F1 layer) − It further has the following two portions − F1(a) layer (Input portion) − In ART1, there would be no processing in this portion rather than having the input vectors only. It is connected to F1(b) layer (interface portion). F1(b) layer (Interface portion) − This portion combines the signal from the input portion with that of F2 layer. F1(b) layer is connected to F2 layer through bottom up weights bij and F2 layer is connected to F1(b) layer through top down weights tji. Cluster Unit (F2 layer) − This is a competitive layer. The unit having the largest net input is selected to learn the input pattern. The activation of all other cluster unit are set to 0. Reset Mechanism − The work of this mechanism is based upon the similarity between the top-down weight and the input vector. Now, if the degree of this similarity is less than the vigilance parameter, then the cluster is not allowed to learn the pattern and a rest would happen. Supplement Unit − Actually the issue with Reset mechanism is that the layer F2 must have to be inhibited under certain conditions and must also be available when some learning happens. That is why two supplemental units namely, G1 and G2 is added along with reset unit, R. They are called gain control units. These units receive and send signals to the other units present in the network. ‘+’ indicates an excitatory signal, while ‘−’ indicates an inhibitory signal. Parameters Used Following parameters are used − n − Number of components in the input vector m − Maximum number of clusters that can be formed bij − Weight from F1(b) to F2 layer, i.e. bottom-up weights tji − Weight from F2 to F1(b) layer, i.e. top-down weights ρ − Vigilance parameter ||x|| − Norm of vector x Algorithm Step 1 − Initialize the learning rate, the vigilance parameter, and the weights as follows − $$alpha:>:1::and::0: $$0: Step 2 − Continue step 3-9, when the stopping condition is not true. Step 3 − Continue step 4-6 for every training input. Step 4 − Set activations of all F1(a) and F1 units as follows F2 = 0 and F1(a) = input vectors Step 5 − Input signal from F1(a) to F1(b) layer must be sent like $$s_{i}:=:x_{i}$$ Step 6 − For every inhibited F2 node $y_{j}:=:sum_i b_{ij}x_{i}$ the condition is yj ≠ -1 Step 7 − Perform step 8-10, when the reset is true. Step 8 − Find J for yJ ≥ yj for all nodes j Step 9 − Again calculate the activation on F1(b) as follows $$x_{i}:=:sitJi$$ Step 10 − Now, after calculating the norm of vector x and vector s, we need to check the reset condition as follows − If ||x||/ ||s|| < vigilance parameter ρ,theninhibit node J and go to step 7 Else If ||x||/ ||s|| ≥ vigilance parameter ρ, then proceed further. Step 11 − Weight updating for node J can be done as follows − $$b_{ij}(new):=:frac{alpha x_{i}}{alpha:-:1:+:||x||}$$ $$t_{ij}(new):=:x_{i}$$ Step 12 − The stopping condition for algorithm must be checked and it may be as follows − Do not have any change in weight. Reset is not performed for units. Maximum number of epochs reached. Learning working make money
Artificial Neural Network – Hopfield Networks Hopfield neural network was invented by Dr. John J. Hopfield in 1982. It consists of a single layer which contains one or more fully connected recurrent neurons. The Hopfield network is commonly used for auto-association and optimization tasks. Discrete Hopfield Network A Hopfield network which operates in a discrete line fashion or in other words, it can be said the input and output patterns are discrete vector, which can be either binary (0,1) or bipolar (+1, -1) in nature. The network has symmetrical weights with no self-connections i.e., wij = wji and wii = 0. Architecture Following are some important points to keep in mind about discrete Hopfield network − This model consists of neurons with one inverting and one non-inverting output. The output of each neuron should be the input of other neurons but not the input of self. Weight/connection strength is represented by wij. Connections can be excitatory as well as inhibitory. It would be excitatory, if the output of the neuron is same as the input, otherwise inhibitory. Weights should be symmetrical, i.e. wij = wji The output from Y1 going to Y2, Yi and Yn have the weights w12, w1i and w1n respectively. Similarly, other arcs have the weights on them. Training Algorithm During training of discrete Hopfield network, weights will be updated. As we know that we can have the binary input vectors as well as bipolar input vectors. Hence, in both the cases, weight updates can be done with the following relation Case 1 − Binary input patterns For a set of binary patterns s(p), p = 1 to P Here, s(p) = s1(p), s2(p),…, si(p),…, sn(p) Weight Matrix is given by $$w_{ij}:=:sum_{p=1}^P[2s_{i}(p)-:1][2s_{j}(p)-:1]:::::for:i:neq:j$$ Case 2 − Bipolar input patterns For a set of binary patterns s(p), p = 1 to P Here, s(p) = s1(p), s2(p),…, si(p),…, sn(p) Weight Matrix is given by $$w_{ij}:=:sum_{p=1}^P[s_{i}(p)][s_{j}(p)]:::::for:i:neq:j$$ Testing Algorithm Step 1 − Initialize the weights, which are obtained from training algorithm by using Hebbian principle. Step 2 − Perform steps 3-9, if the activations of the network is not consolidated. Step 3 − For each input vector X, perform steps 4-8. Step 4 − Make initial activation of the network equal to the external input vector X as follows − $$y_{i}:=:x_{i}:::for:i:=:1:to:n$$ Step 5 − For each unit Yi, perform steps 6-9. Step 6 − Calculate the net input of the network as follows − $$y_{ini}:=:x_{i}:+:displaystylesumlimits_{j}y_{j}w_{ji}$$ Step 7 − Apply the activation as follows over the net input to calculate the output − $$y_{i}:=begin{cases}1 & if:y_{ini}:>:theta_{i}\y_{i} & if:y_{ini}:=:theta_{i}\0 & if:y_{ini}: Here $theta_{i}$ is the threshold. Step 8 − Broadcast this output yi to all other units. Step 9 − Test the network for conjunction. Energy Function Evaluation An energy function is defined as a function that is bonded and non-increasing function of the state of the system. Energy function Ef, also called Lyapunov function determines the stability of discrete Hopfield network, and is characterized as follows − $$E_{f}:=:-frac{1}{2}displaystylesumlimits_{i=1}^ndisplaystylesumlimits_{j=1}^n y_{i}y_{j}w_{ij}:-:displaystylesumlimits_{i=1}^n x_{i}y_{i}:+:displaystylesumlimits_{i=1}^n theta_{i}y_{i}$$ Condition − In a stable network, whenever the state of node changes, the above energy function will decrease. Suppose when node i has changed state from $y_i^{(k)}$ to $y_i^{(k:+:1)}$ then the Energy change $Delta E_{f}$ is given by the following relation $$Delta E_{f}:=:E_{f}(y_i^{(k+1)}):-:E_{f}(y_i^{(k)})$$ $$=:-left(begin{array}{c}displaystylesumlimits_{j=1}^n w_{ij}y_i^{(k)}:+:x_{i}:-:theta_{i}end{array}right)(y_i^{(k+1)}:-:y_i^{(k)})$$ $$=:-:(net_{i})Delta y_{i}$$ Here $Delta y_{i}:=:y_i^{(k:+:1)}:-:y_i^{(k)}$ The change in energy depends on the fact that only one unit can update its activation at a time. Continuous Hopfield Network In comparison with Discrete Hopfield network, continuous network has time as a continuous variable. It is also used in auto association and optimization problems such as travelling salesman problem. Model − The model or architecture can be build up by adding electrical components such as amplifiers which can map the input voltage to the output voltage over a sigmoid activation function. Energy Function Evaluation $$E_f = frac{1}{2}displaystylesumlimits_{i=1}^nsum_{substack{j = 1\ j ne i}}^n y_i y_j w_{ij} – displaystylesumlimits_{i=1}^n x_i y_i + frac{1}{lambda} displaystylesumlimits_{i=1}^n sum_{substack{j = 1\ j ne i}}^n w_{ij} g_{ri} int_{0}^{y_i} a^{-1}(y) dy$$ Here λ is gain parameter and gri input conductance. Learning working make money
Artificial Neural Network – Building Blocks Processing of ANN depends upon the following three building blocks − Network Topology Adjustments of Weights or Learning Activation Functions In this chapter, we will discuss in detail about these three building blocks of ANN Network Topology A network topology is the arrangement of a network along with its nodes and connecting lines. According to the topology, ANN can be classified as the following kinds − Feedforward Network It is a non-recurrent network having processing units/nodes in layers and all the nodes in a layer are connected with the nodes of the previous layers. The connection has different weights upon them. There is no feedback loop means the signal can only flow in one direction, from input to output. It may be divided into the following two types − Single layer feedforward network − The concept is of feedforward ANN having only one weighted layer. In other words, we can say the input layer is fully connected to the output layer. Multilayer feedforward network − The concept is of feedforward ANN having more than one weighted layer. As this network has one or more layers between the input and the output layer, it is called hidden layers. Feedback Network As the name suggests, a feedback network has feedback paths, which means the signal can flow in both directions using loops. This makes it a non-linear dynamic system, which changes continuously until it reaches a state of equilibrium. It may be divided into the following types − Recurrent networks − They are feedback networks with closed loops. Following are the two types of recurrent networks. Fully recurrent network − It is the simplest neural network architecture because all nodes are connected to all other nodes and each node works as both input and output. Jordan network − It is a closed loop network in which the output will go to the input again as feedback as shown in the following diagram. Adjustments of Weights or Learning Learning, in artificial neural network, is the method of modifying the weights of connections between the neurons of a specified network. Learning in ANN can be classified into three categories namely supervised learning, unsupervised learning, and reinforcement learning. Supervised Learning As the name suggests, this type of learning is done under the supervision of a teacher. This learning process is dependent. During the training of ANN under supervised learning, the input vector is presented to the network, which will give an output vector. This output vector is compared with the desired output vector. An error signal is generated, if there is a difference between the actual output and the desired output vector. On the basis of this error signal, the weights are adjusted until the actual output is matched with the desired output. Unsupervised Learning As the name suggests, this type of learning is done without the supervision of a teacher. This learning process is independent. During the training of ANN under unsupervised learning, the input vectors of similar type are combined to form clusters. When a new input pattern is applied, then the neural network gives an output response indicating the class to which the input pattern belongs. There is no feedback from the environment as to what should be the desired output and if it is correct or incorrect. Hence, in this type of learning, the network itself must discover the patterns and features from the input data, and the relation for the input data over the output. Reinforcement Learning As the name suggests, this type of learning is used to reinforce or strengthen the network over some critic information. This learning process is similar to supervised learning, however we might have very less information. During the training of network under reinforcement learning, the network receives some feedback from the environment. This makes it somewhat similar to supervised learning. However, the feedback obtained here is evaluative not instructive, which means there is no teacher as in supervised learning. After receiving the feedback, the network performs adjustments of the weights to get better critic information in future. Activation Functions It may be defined as the extra force or effort applied over the input to obtain an exact output. In ANN, we can also apply activation functions over the input to get the exact output. Followings are some activation functions of interest − Linear Activation Function It is also called the identity function as it performs no input editing. It can be defined as − $$F(x):=:x$$ Sigmoid Activation Function It is of two type as follows − Binary sigmoidal function − This activation function performs input editing between 0 and 1. It is positive in nature. It is always bounded, which means its output cannot be less than 0 and more than 1. It is also strictly increasing in nature, which means more the input higher would be the output. It can be defined as $$F(x):=:sigm(x):=:frac{1}{1:+:exp(-x)}$$ Bipolar sigmoidal function − This activation function performs input editing between -1 and 1. It can be positive or negative in nature. It is always bounded, which means its output cannot be less than -1 and more than 1. It is also strictly increasing in nature like sigmoid function. It can be defined as $$F(x):=:sigm(x):=:frac{2}{1:+:exp(-x)}:-:1:=:frac{1:-:exp(x)}{1:+:exp(x)}$$ Learning working make money
Learning and Adaptation As stated earlier, ANN is completely inspired by the way biological nervous system, i.e. the human brain works. The most impressive characteristic of the human brain is to learn, hence the same feature is acquired by ANN. What Is Learning in ANN? Basically, learning means to do and adapt the change in itself as and when there is a change in environment. ANN is a complex system or more precisely we can say that it is a complex adaptive system, which can change its internal structure based on the information passing through it. Why Is It important? Being a complex adaptive system, learning in ANN implies that a processing unit is capable of changing its input/output behavior due to the change in environment. The importance of learning in ANN increases because of the fixed activation function as well as the input/output vector, when a particular network is constructed. Now to change the input/output behavior, we need to adjust the weights. Classification It may be defined as the process of learning to distinguish the data of samples into different classes by finding common features between the samples of the same classes. For example, to perform training of ANN, we have some training samples with unique features, and to perform its testing we have some testing samples with other unique features. Classification is an example of supervised learning. Neural Network Learning Rules We know that, during ANN learning, to change the input/output behavior, we need to adjust the weights. Hence, a method is required with the help of which the weights can be modified. These methods are called Learning rules, which are simply algorithms or equations. Following are some learning rules for the neural network − Hebbian Learning Rule This rule, one of the oldest and simplest, was introduced by Donald Hebb in his book The Organization of Behavior in 1949. It is a kind of feed-forward, unsupervised learning. Basic Concept − This rule is based on a proposal given by Hebb, who wrote − “When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased.” From the above postulate, we can conclude that the connections between two neurons might be strengthened if the neurons fire at the same time and might weaken if they fire at different times. Mathematical Formulation − According to Hebbian learning rule, following is the formula to increase the weight of connection at every time step. $$Delta w_{ji}(t):=:alpha x_{i}(t).y_{j}(t)$$ Here, $Delta w_{ji}(t)$ = increment by which the weight of connection increases at time step t $alpha$ = the positive and constant learning rate $x_{i}(t)$ = the input value from pre-synaptic neuron at time step t $y_{i}(t)$ = the output of pre-synaptic neuron at same time step t Perceptron Learning Rule This rule is an error correcting the supervised learning algorithm of single layer feedforward networks with linear activation function, introduced by Rosenblatt. Basic Concept − As being supervised in nature, to calculate the error, there would be a comparison between the desired/target output and the actual output. If there is any difference found, then a change must be made to the weights of connection. Mathematical Formulation − To explain its mathematical formulation, suppose we have ‘n’ number of finite input vectors, x(n), along with its desired/target output vector t(n), where n = 1 to N. Now the output ‘y’ can be calculated, as explained earlier on the basis of the net input, and activation function being applied over that net input can be expressed as follows − $$y:=:f(y_{in}):=:begin{cases}1, & y_{in}:>:theta \0, & y_{in}:leqslant:thetaend{cases}$$ Where θ is threshold. The updating of weight can be done in the following two cases − Case I − when t ≠ y, then $$w(new):=:w(old):+;tx$$ Case II − when t = y, then No change in weight Delta Learning Rule (Widrow-Hoff Rule) It is introduced by Bernard Widrow and Marcian Hoff, also called Least Mean Square (LMS) method, to minimize the error over all training patterns. It is kind of supervised learning algorithm with having continuous activation function. Basic Concept − The base of this rule is gradient-descent approach, which continues forever. Delta rule updates the synaptic weights so as to minimize the net input to the output unit and the target value. Mathematical Formulation − To update the synaptic weights, delta rule is given by $$Delta w_{i}:=:alpha:.x_{i}.e_{j}$$ Here $Delta w_{i}$ = weight change for ith pattern; $alpha$ = the positive and constant learning rate; $x_{i}$ = the input value from pre-synaptic neuron; $e_{j}$ = $(t:-:y_{in})$, the difference between the desired/target output and the actual output $y_{in}$ The above delta rule is for a single output unit only. The updating of weight can be done in the following two cases − Case-I − when t ≠ y, then $$w(new):=:w(old):+:Delta w$$ Case-II − when t = y, then No change in weight Competitive Learning Rule (Winner-takes-all) It is concerned with unsupervised training in which the output nodes try to compete with each other to represent the input pattern. To understand this learning rule, we must understand the competitive network which is given as follows − Basic Concept of Competitive Network − This network is just like a single layer feedforward network with feedback connection between outputs. The connections between outputs are inhibitory type, shown by dotted lines, which means the competitors never support themselves. Basic Concept of Competitive Learning Rule − As said earlier, there will be a competition among the output nodes. Hence, the main concept is that during training, the output unit with the highest activation to a given input pattern, will be declared the winner. This rule is also called Winner-takes-all because only the winning neuron is updated and the rest of the neurons are left unchanged. Mathematical formulation − Following are the three important factors for mathematical formulation
Supervised Learning As the name suggests, supervised learning takes place under the supervision of a teacher. This learning process is dependent. During the training of ANN under supervised learning, the input vector is presented to the network, which will produce an output vector. This output vector is compared with the desired/target output vector. An error signal is generated if there is a difference between the actual output and the desired/target output vector. On the basis of this error signal, the weights would be adjusted until the actual output is matched with the desired output. Perceptron Developed by Frank Rosenblatt by using McCulloch and Pitts model, perceptron is the basic operational unit of artificial neural networks. It employs supervised learning rule and is able to classify the data into two classes. Operational characteristics of the perceptron: It consists of a single neuron with an arbitrary number of inputs along with adjustable weights, but the output of the neuron is 1 or 0 depending upon the threshold. It also consists of a bias whose weight is always 1. Following figure gives a schematic representation of the perceptron. Perceptron thus has the following three basic elements − Links − It would have a set of connection links, which carries a weight including a bias always having weight 1. Adder − It adds the input after they are multiplied with their respective weights. Activation function − It limits the output of neuron. The most basic activation function is a Heaviside step function that has two possible outputs. This function returns 1, if the input is positive, and 0 for any negative input. Training Algorithm Perceptron network can be trained for single output unit as well as multiple output units. Training Algorithm for Single Output Unit Step 1 − Initialize the following to start the training − Weights Bias Learning rate $alpha$ For easy calculation and simplicity, weights and bias must be set equal to 0 and the learning rate must be set equal to 1. Step 2 − Continue step 3-8 when the stopping condition is not true. Step 3 − Continue step 4-6 for every training vector x. Step 4 − Activate each input unit as follows − $$x_{i}:=:s_{i}:(i:=:1:to:n)$$ Step 5 − Now obtain the net input with the following relation − $$y_{in}:=:b:+:displaystylesumlimits_{i}^n x_{i}.:w_{i}$$ Here ‘b’ is bias and ‘n’ is the total number of input neurons. Step 6 − Apply the following activation function to obtain the final output. $$f(y_{in}):=:begin{cases}1 & if:y_{in}:>:theta\0 & if : -theta:leqslant:y_{in}:leqslant:theta\-1 & if:y_{in}: Step 7 − Adjust the weight and bias as follows − Case 1 − if y ≠ t then, $$w_{i}(new):=:w_{i}(old):+:alpha:tx_{i}$$ $$b(new):=:b(old):+:alpha t$$ Case 2 − if y = t then, $$w_{i}(new):=:w_{i}(old)$$ $$b(new):=:b(old)$$ Here ‘y’ is the actual output and ‘t’ is the desired/target output. Step 8 − Test for the stopping condition, which would happen when there is no change in weight. Training Algorithm for Multiple Output Units The following diagram is the architecture of perceptron for multiple output classes. Step 1 − Initialize the following to start the training − Weights Bias Learning rate $alpha$ For easy calculation and simplicity, weights and bias must be set equal to 0 and the learning rate must be set equal to 1. Step 2 − Continue step 3-8 when the stopping condition is not true. Step 3 − Continue step 4-6 for every training vector x. Step 4 − Activate each input unit as follows − $$x_{i}:=:s_{i}:(i:=:1:to:n)$$ Step 5 − Obtain the net input with the following relation − $$y_{in}:=:b:+:displaystylesumlimits_{i}^n x_{i}:w_{ij}$$ Here ‘b’ is bias and ‘n’ is the total number of input neurons. Step 6 − Apply the following activation function to obtain the final output for each output unit j = 1 to m − $$f(y_{in}):=:begin{cases}1 & if:y_{inj}:>:theta\0 & if : -theta:leqslant:y_{inj}:leqslant:theta\-1 & if:y_{inj}: Step 7 − Adjust the weight and bias for x = 1 to n and j = 1 to m as follows − Case 1 − if yj ≠ tj then, $$w_{ij}(new):=:w_{ij}(old):+:alpha:t_{j}x_{i}$$ $$b_{j}(new):=:b_{j}(old):+:alpha t_{j}$$ Case 2 − if yj = tj then, $$w_{ij}(new):=:w_{ij}(old)$$ $$b_{j}(new):=:b_{j}(old)$$ Here ‘y’ is the actual output and ‘t’ is the desired/target output. Step 8 − Test for the stopping condition, which will happen when there is no change in weight. Adaptive Linear Neuron (Adaline) Adaline which stands for Adaptive Linear Neuron, is a network having a single linear unit. It was developed by Widrow and Hoff in 1960. Some important points about Adaline are as follows − It uses bipolar activation function. It uses delta rule for training to minimize the Mean-Squared Error (MSE) between the actual output and the desired/target output. The weights and the bias are adjustable. Architecture The basic structure of Adaline is similar to perceptron having an extra feedback loop with the help of which the actual output is compared with the desired/target output. After comparison on the basis of training algorithm, the weights and bias will be updated. Training Algorithm Step 1 − Initialize the following to start the training − Weights Bias Learning rate $alpha$ For easy calculation and simplicity, weights and bias must be set equal to 0 and the learning rate must be set equal to 1. Step 2 − Continue step 3-8 when the stopping condition is not true. Step 3 − Continue step 4-6 for every bipolar training pair s:t. Step 4 − Activate each input unit as follows − $$x_{i}:=:s_{i}:(i:=:1:to:n)$$ Step 5 − Obtain the net input with the following relation − $$y_{in}:=:b:+:displaystylesumlimits_{i}^n x_{i}:w_{i}$$ Here ‘b’ is bias and ‘n’ is the total number of input neurons. Step 6 − Apply the following activation function to obtain the final output − $$f(y_{in}):=:begin{cases}1 & if:y_{in}:geqslant:0 \-1 & if:y_{in}: Step 7 − Adjust the weight and bias as follows − Case 1 − if y ≠ t then, $$w_{i}(new):=:w_{i}(old):+: alpha(t:-:y_{in})x_{i}$$ $$b(new):=:b(old):+: alpha(t:-:y_{in})$$ Case 2 − if y = t then, $$w_{i}(new):=:w_{i}(old)$$ $$b(new):=:b(old)$$ Here ‘y’ is the actual output and ‘t’ is the desired/target output. $(t:-;y_{in})$ is the computed error. Step 8 − Test for the stopping
Artificial Neural Networks Tutorial Job Search Artificial Neural Networks are parallel computing devices, which are basically an attempt to make a computer model of the brain. The main objective is to develop a system to perform various computational tasks faster than the traditional systems. This tutorial covers the basic concept and terminologies involved in Artificial Neural Network. Sections of this tutorial also explain the architecture as well as the training algorithm of various networks used in ANN. Audience This tutorial will be useful for graduates, post graduates, and research students who either have an interest in this subject or have this subject as a part of their curriculum. The reader can be a beginner or an advanced learner. Prerequisites Artificial Neural Networks (ANN) is an advanced topic, hence the reader must have basic knowledge of Algorithms, Programming, and Mathematics. Learning working make money