Artificial Neural Network – Quick Guide Artificial Neural Network – Basic Concepts Neural networks are parallel computing devices, which is basically an attempt to make a computer model of the brain. The main objective is to develop a system to perform various computational tasks faster than the traditional systems. These tasks include pattern recognition and classification, approximation, optimization, and data clustering. What is Artificial Neural Network? Artificial Neural Network (ANN) is an efficient computing system whose central theme is borrowed from the analogy of biological neural networks. ANNs are also named as “artificial neural systems,” or “parallel distributed processing systems,” or “connectionist systems.” ANN acquires a large collection of units that are interconnected in some pattern to allow communication between the units. These units, also referred to as nodes or neurons, are simple processors which operate in parallel. Every neuron is connected with other neuron through a connection link. Each connection link is associated with a weight that has information about the input signal. This is the most useful information for neurons to solve a particular problem because the weight usually excites or inhibits the signal that is being communicated. Each neuron has an internal state, which is called an activation signal. Output signals, which are produced after combining the input signals and activation rule, may be sent to other units. A Brief History of ANN The history of ANN can be divided into the following three eras − ANN during 1940s to 1960s Some key developments of this era are as follows − 1943 − It has been assumed that the concept of neural network started with the work of physiologist, Warren McCulloch, and mathematician, Walter Pitts, when in 1943 they modeled a simple neural network using electrical circuits in order to describe how neurons in the brain might work. 1949 − Donald Hebb’s book, The Organization of Behavior, put forth the fact that repeated activation of one neuron by another increases its strength each time they are used. 1956 − An associative memory network was introduced by Taylor. 1958 − A learning method for McCulloch and Pitts neuron model named Perceptron was invented by Rosenblatt. 1960 − Bernard Widrow and Marcian Hoff developed models called “ADALINE” and “MADALINE.” ANN during 1960s to 1980s Some key developments of this era are as follows − 1961 − Rosenblatt made an unsuccessful attempt but proposed the “backpropagation” scheme for multilayer networks. 1964 − Taylor constructed a winner-take-all circuit with inhibitions among output units. 1969 − Multilayer perceptron (MLP) was invented by Minsky and Papert. 1971 − Kohonen developed Associative memories. 1976 − Stephen Grossberg and Gail Carpenter developed Adaptive resonance theory. ANN from 1980s till Present Some key developments of this era are as follows − 1982 − The major development was Hopfield’s Energy approach. 1985 − Boltzmann machine was developed by Ackley, Hinton, and Sejnowski. 1986 − Rumelhart, Hinton, and Williams introduced Generalised Delta Rule. 1988 − Kosko developed Binary Associative Memory (BAM) and also gave the concept of Fuzzy Logic in ANN. The historical review shows that significant progress has been made in this field. Neural network based chips are emerging and applications to complex problems are being developed. Surely, today is a period of transition for neural network technology. Biological Neuron A nerve cell (neuron) is a special biological cell that processes information. According to an estimation, there are huge number of neurons, approximately 1011 with numerous interconnections, approximately 1015. Schematic Diagram Working of a Biological Neuron As shown in the above diagram, a typical neuron consists of the following four parts with the help of which we can explain its working − Dendrites − They are tree-like branches, responsible for receiving the information from other neurons it is connected to. In other sense, we can say that they are like the ears of neuron. Soma − It is the cell body of the neuron and is responsible for processing of information, they have received from dendrites. Axon − It is just like a cable through which neurons send the information. Synapses − It is the connection between the axon and other neuron dendrites. ANN versus BNN Before taking a look at the differences between Artificial Neural Network (ANN) and Biological Neural Network (BNN), let us take a look at the similarities based on the terminology between these two. Biological Neural Network (BNN) Artificial Neural Network (ANN) Soma Node Dendrites Input Synapse Weights or Interconnections Axon Output The following table shows the comparison between ANN and BNN based on some criteria mentioned. Criteria BNN ANN Processing Massively parallel, slow but superior than ANN Massively parallel, fast but inferior than BNN Size 1011 neurons and 1015 interconnections 102 to 104 nodes (mainly depends on the type of application and network designer) Learning They can tolerate ambiguity Very precise, structured and formatted data is required to tolerate ambiguity Fault tolerance Performance degrades with even partial damage It is capable of robust performance, hence has the potential to be fault tolerant Storage capacity Stores the information in the synapse Stores the information in continuous memory locations Model of Artificial Neural Network The following diagram represents the general model of ANN followed by its processing. For the above general model of artificial neural network, the net input can be calculated as follows − $$y_{in}:=:x_{1}.w_{1}:+:x_{2}.w_{2}:+:x_{3}.w_{3}:dotso: x_{m}.w_{m}$$ i.e., Net input $y_{in}:=:sum_i^m:x_{i}.w_{i}$ The output can be calculated by applying the activation function over the net input. $$Y:=:F(y_{in}) $$ Output = function (net input calculated) Artificial Neural Network – Building Blocks Processing of ANN depends upon the following three building blocks − Network Topology Adjustments of Weights or Learning Activation Functions In this chapter, we will discuss in detail about these three building blocks of ANN Network Topology A network topology is the arrangement of a network along with its nodes and connecting lines. According to the topology, ANN can be classified as the following kinds − Feedforward Network It is a non-recurrent network having processing units/nodes in layers and all the nodes in a
Category: artificial Neural Network
Applications of Neural Networks Before studying the fields where ANN has been used extensively, we need to understand why ANN would be the preferred choice of application. Why Artificial Neural Networks? We need to understand the answer to the above question with an example of a human being. As a child, we used to learn the things with the help of our elders, which includes our parents or teachers. Then later by self-learning or practice we keep learning throughout our life. Scientists and researchers are also making the machine intelligent, just like a human being, and ANN plays a very important role in the same due to the following reasons − With the help of neural networks, we can find the solution of such problems for which algorithmic method is expensive or does not exist. Neural networks can learn by example, hence we do not need to program it at much extent. Neural networks have the accuracy and significantly fast speed than conventional speed. Areas of Application Followings are some of the areas, where ANN is being used. It suggests that ANN has an interdisciplinary approach in its development and applications. Speech Recognition Speech occupies a prominent role in human-human interaction. Therefore, it is natural for people to expect speech interfaces with computers. In the present era, for communication with machines, humans still need sophisticated languages which are difficult to learn and use. To ease this communication barrier, a simple solution could be, communication in a spoken language that is possible for the machine to understand. Great progress has been made in this field, however, still such kinds of systems are facing the problem of limited vocabulary or grammar along with the issue of retraining of the system for different speakers in different conditions. ANN is playing a major role in this area. Following ANNs have been used for speech recognition − Multilayer networks Multilayer networks with recurrent connections Kohonen self-organizing feature map The most useful network for this is Kohonen Self-Organizing feature map, which has its input as short segments of the speech waveform. It will map the same kind of phonemes as the output array, called feature extraction technique. After extracting the features, with the help of some acoustic models as back-end processing, it will recognize the utterance. Character Recognition It is an interesting problem which falls under the general area of Pattern Recognition. Many neural networks have been developed for automatic recognition of handwritten characters, either letters or digits. Following are some ANNs which have been used for character recognition − Multilayer neural networks such as Backpropagation neural networks. Neocognitron Though back-propagation neural networks have several hidden layers, the pattern of connection from one layer to the next is localized. Similarly, neocognitron also has several hidden layers and its training is done layer by layer for such kind of applications. Signature Verification Application Signatures are one of the most useful ways to authorize and authenticate a person in legal transactions. Signature verification technique is a non-vision based technique. For this application, the first approach is to extract the feature or rather the geometrical feature set representing the signature. With these feature sets, we have to train the neural networks using an efficient neural network algorithm. This trained neural network will classify the signature as being genuine or forged under the verification stage. Human Face Recognition It is one of the biometric methods to identify the given face. It is a typical task because of the characterization of “non-face” images. However, if a neural network is well trained, then it can be divided into two classes namely images having faces and images that do not have faces. First, all the input images must be preprocessed. Then, the dimensionality of that image must be reduced. And, at last it must be classified using neural network training algorithm. Following neural networks are used for training purposes with preprocessed image − Fully-connected multilayer feed-forward neural network trained with the help of back-propagation algorithm. For dimensionality reduction, Principal Component Analysis (PCA) is used. Learning working make money
Discuss Artificial Neural Network Neural networks are parallel computing devices, which are basically an attempt to make a computer model of the brain. The main objective is to develop a system to perform various computational tasks faster than the traditional systems. This tutorial covers the basic concept and terminologies involved in Artificial Neural Network. Sections of this tutorial also explain the architecture as well as the training algorithm of various networks used in ANN. Learning working make money
Optimization Using Hopfield Network Optimization is an action of making something such as design, situation, resource, and system as effective as possible. Using a resemblance between the cost function and energy function, we can use highly interconnected neurons to solve optimization problems. Such a kind of neural network is Hopfield network, that consists of a single layer containing one or more fully connected recurrent neurons. This can be used for optimization. Points to remember while using Hopfield network for optimization − The energy function must be minimum of the network. It will find satisfactory solution rather than select one out of the stored patterns. The quality of the solution found by Hopfield network depends significantly on the initial state of the network. Travelling Salesman Problem Finding the shortest route travelled by the salesman is one of the computational problems, which can be optimized by using Hopfield neural network. Basic Concept of TSP Travelling Salesman Problem (TSP) is a classical optimization problem in which a salesman has to travel n cities, which are connected with each other, keeping the cost as well as the distance travelled minimum. For example, the salesman has to travel a set of 4 cities A, B, C, D and the goal is to find the shortest circular tour, A-B-C–D, so as to minimize the cost, which also includes the cost of travelling from the last city D to the first city A. Matrix Representation Actually each tour of n-city TSP can be expressed as n × n matrix whose ith row describes the ith city’s location. This matrix, M, for 4 cities A, B, C, D can be expressed as follows − $$M = begin{bmatrix}A: & 1 & 0 & 0 & 0 \B: & 0 & 1 & 0 & 0 \C: & 0 & 0 & 1 & 0 \D: & 0 & 0 & 0 & 1 end{bmatrix}$$ Solution by Hopfield Network While considering the solution of this TSP by Hopfield network, every node in the network corresponds to one element in the matrix. Energy Function Calculation To be the optimized solution, the energy function must be minimum. On the basis of the following constraints, we can calculate the energy function as follows − Constraint-I First constraint, on the basis of which we will calculate energy function, is that one element must be equal to 1 in each row of matrix M and other elements in each row must equal to 0 because each city can occur in only one position in the TSP tour. This constraint can mathematically be written as follows − $$displaystylesumlimits_{j=1}^n M_{x,j}:=:1:for : x:in :lbrace1,…,nrbrace$$ Now the energy function to be minimized, based on the above constraint, will contain a term proportional to − $$displaystylesumlimits_{x=1}^n left(begin{array}{c}1:-:displaystylesumlimits_{j=1}^n M_{x,j}end{array}right)^2$$ Constraint-II As we know, in TSP one city can occur in any position in the tour hence in each column of matrix M, one element must equal to 1 and other elements must be equal to 0. This constraint can mathematically be written as follows − $$displaystylesumlimits_{x=1}^n M_{x,j}:=:1:for : j:in :lbrace1,…,nrbrace$$ Now the energy function to be minimized, based on the above constraint, will contain a term proportional to − $$displaystylesumlimits_{j=1}^n left(begin{array}{c}1:-:displaystylesumlimits_{x=1}^n M_{x,j}end{array}right)^2$$ Cost Function Calculation Let’s suppose a square matrix of (n × n) denoted by C denotes the cost matrix of TSP for n cities where n > 0. Following are some parameters while calculating the cost function − Cx, y − The element of cost matrix denotes the cost of travelling from city x to y. Adjacency of the elements of A and B can be shown by the following relation − $$M_{x,i}:=:1:: and:: M_{y,ipm 1}:=:1$$ As we know, in Matrix the output value of each node can be either 0 or 1, hence for every pair of cities A, B we can add the following terms to the energy function − $$displaystylesumlimits_{i=1}^n C_{x,y}M_{x,i}(M_{y,i+1}:+:M_{y,i-1})$$ On the basis of the above cost function and constraint value, the final energy function E can be given as follows − $$E:=:frac{1}{2}displaystylesumlimits_{i=1}^ndisplaystylesumlimits_{x}displaystylesumlimits_{yneq x}C_{x,y}M_{x,i}(M_{y,i+1}:+:M_{y,i-1}):+$$ $$:begin{bmatrix}gamma_{1} displaystylesumlimits_{x} left(begin{array}{c}1:-:displaystylesumlimits_{i} M_{x,i}end{array}right)^2:+: gamma_{2} displaystylesumlimits_{i} left(begin{array}{c}1:-:displaystylesumlimits_{x} M_{x,i}end{array}right)^2 end{bmatrix}$$ Here, γ1 and γ2 are two weighing constants. Learning working make money
Artificial Neural Network – Useful Resources The following resources contain additional information on Artificial Neural Network. Please use them to get more in-depth knowledge on this. Useful Video Courses Best Seller 88 Lectures 9.5 hours 16 Lectures 54 mins 12 Lectures 1 hours 21 Lectures 2 hours Most Popular 19 Lectures 58 mins 18 Lectures 1.5 hours Learning working make money
Artificial Neural Network – Genetic Algorithm Nature has always been a great source of inspiration to all mankind. Genetic Algorithms (GAs) are search-based algorithms based on the concepts of natural selection and genetics. GAs are a subset of a much larger branch of computation known as Evolutionary Computation. GAs was developed by John Holland and his students and colleagues at the University of Michigan, most notably David E. Goldberg and has since been tried on various optimization problems with a high degree of success. In GAs, we have a pool or a population of possible solutions to the given problem. These solutions then undergo recombination and mutation (like in natural genetics), producing new children, and the process is repeated over various generations. Each individual (or candidate solution) is assigned a fitness value (based on its objective function value) and the fitter individuals are given a higher chance to mate and yield more “fitter” individuals. This is in line with the Darwinian Theory of “Survival of the Fittest”. In this way, we keep “evolving” better individuals or solutions over generations, till we reach a stopping criterion. Genetic Algorithms are sufficiently randomized in nature, however they perform much better than random local search (in which we just try various random solutions, keeping track of the best so far), as they exploit historical information as well. Advantages of GAs GAs have various advantages which have made them immensely popular. These include − Does not require any derivative information (which may not be available for many real-world problems). Is faster and more efficient as compared to the traditional methods. Has very good parallel capabilities. Optimizes both continuous and discrete functions as well as multi-objective problems. Provides a list of “good” solutions and not just a single solution. Always gets an answer to the problem, which gets better over the time. Useful when the search space is very large and there are large number of parameters involved. Limitations of GAs Like any technique, GAs also suffers from a few limitations. These include − GAs are not suited for all problems, especially problems which are simple and for which derivative information is available. Fitness value is calculated repeatedly, which might be computationally expensive for some problems. Being stochastic, there are no guarantees on the optimality or the quality of the solution. If not implemented properly, GA may not converge to the optimal solution. GA – Motivation Genetic Algorithms have the ability to deliver a “good-enough” solution “fast-enough”. This makes Gas attractive for use in solving optimization problems. The reasons why GAs are needed are as follows − Solving Difficult Problems In computer science, there is a large set of problems, which are NP-Hard. What this essentially means is that, even the most powerful computing systems take a very long time (even years!) to solve that problem. In such a scenario, GAs prove to be an efficient tool to provide usable near-optimal solutions in a short amount of time. Failure of Gradient Based Methods Traditional calculus based methods work by starting at a random point and by moving in the direction of the gradient, till we reach the top of the hill. This technique is efficient and works very well for single-peaked objective functions like the cost function in linear regression. However, in most real-world situations, we have a very complex problem called as landscapes, made of many peaks and many valleys, which causes such methods to fail, as they suffer from an inherent tendency of getting stuck at the local optima as shown in the following figure. Getting a Good Solution Fast Some difficult problems like the Travelling Salesman Problem (TSP), have real-world applications like path finding and VLSI Design. Now imagine that you are using your GPS Navigation system, and it takes a few minutes (or even a few hours) to compute the “optimal” path from the source to destination. Delay in such real-world applications is not acceptable and therefore a “good-enough” solution, which is delivered “fast” is what is required. How to Use GA for Optimization Problems? We already know that optimization is an action of making something such as design, situation, resource, and system as effective as possible. Optimization process is shown in the following diagram. Stages of GA Mechanism for Optimization Process Followings are the stages of GA mechanism when used for optimization of problems. Generate the initial population randomly. Select the initial solution with the best fitness values. Recombine the selected solutions using mutation and crossover operators. Insert offspring into the population. Now if the stop condition is met, then return the solution with their best fitness value. Else, go to step 2. Learning working make money
Boltzmann Machine These are stochastic learning processes having recurrent structure and are the basis of the early optimization techniques used in ANN. Boltzmann Machine was invented by Geoffrey Hinton and Terry Sejnowski in 1985. More clarity can be observed in the words of Hinton on Boltzmann Machine. “A surprising feature of this network is that it uses only locally available information. The change of weight depends only on the behavior of the two units it connects, even though the change optimizes a global measure” – Ackley, Hinton 1985. Some important points about Boltzmann Machine − They use recurrent structure. They consist of stochastic neurons, which have one of the two possible states, either 1 or 0. Some of the neurons in this are adaptive (free state) and some are clamped (frozen state). If we apply simulated annealing on discrete Hopfield network, then it would become Boltzmann Machine. Objective of Boltzmann Machine The main purpose of Boltzmann Machine is to optimize the solution of a problem. It is the work of Boltzmann Machine to optimize the weights and quantity related to that particular problem. Architecture The following diagram shows the architecture of Boltzmann machine. It is clear from the diagram, that it is a two-dimensional array of units. Here, weights on interconnections between units are –p where p > 0. The weights of self-connections are given by b where b > 0. Training Algorithm As we know that Boltzmann machines have fixed weights, hence there will be no training algorithm as we do not need to update the weights in the network. However, to test the network we have to set the weights as well as to find the consensus function (CF). Boltzmann machine has a set of units Ui and Uj and has bi-directional connections on them. We are considering the fixed weight say wij. wij ≠ 0 if Ui and Uj are connected. There also exists a symmetry in weighted interconnection, i.e. wij = wji. wii also exists, i.e. there would be the self-connection between units. For any unit Ui, its state ui would be either 1 or 0. The main objective of Boltzmann Machine is to maximize the Consensus Function (CF) which can be given by the following relation $$CF:=:displaystylesumlimits_{i} displaystylesumlimits_{jleqslant i} w_{ij}u_{i}u_{j}$$ Now, when the state changes from either 1 to 0 or from 0 to 1, then the change in consensus can be given by the following relation − $$Delta CF:=:(1:-:2u_{i})(w_{ij}:+:displaystylesumlimits_{jneq i} u_{i} w_{ij})$$ Here ui is the current state of Ui. The variation in coefficient (1 – 2ui) is given by the following relation − $$(1:-:2u_{i}):=:begin{cases}+1, & U_{i}:is:currently:off\-1, & U_{i}:is:currently:onend{cases}$$ Generally, unit Ui does not change its state, but if it does then the information would be residing local to the unit. With that change, there would also be an increase in the consensus of the network. Probability of the network to accept the change in the state of the unit is given by the following relation − $$AF(i,T):=:frac{1}{1:+:exp[-frac{Delta CF(i)}{T}]}$$ Here, T is the controlling parameter. It will decrease as CF reaches the maximum value. Testing Algorithm Step 1 − Initialize the following to start the training − Weights representing the constraint of the problem Control Parameter T Step 2 − Continue steps 3-8, when the stopping condition is not true. Step 3 − Perform steps 4-7. Step 4 − Assume that one of the state has changed the weight and choose the integer I, J as random values between 1 and n. Step 5 − Calculate the change in consensus as follows − $$Delta CF:=:(1:-:2u_{i})(w_{ij}:+:displaystylesumlimits_{jneq i} u_{i} w_{ij})$$ Step 6 − Calculate the probability that this network would accept the change in state $$AF(i,T):=:frac{1}{1:+:exp[-frac{Delta CF(i)}{T}]}$$ Step 7 − Accept or reject this change as follows − Case I − if R < AF, accept the change. Case II − if R ≥ AF, reject the change. Here, R is the random number between 0 and 1. Step 8 − Reduce the control parameter (temperature) as follows − T(new) = 0.95T(old) Step 9 − Test for the stopping conditions which may be as follows − Temperature reaches a specified value There is no change in state for a specified number of iterations Learning working make money
Kohonen Self-Organizing Feature Maps Suppose we have some pattern of arbitrary dimensions, however, we need them in one dimension or two dimensions. Then the process of feature mapping would be very useful to convert the wide pattern space into a typical feature space. Now, the question arises why do we require self-organizing feature map? The reason is, along with the capability to convert the arbitrary dimensions into 1-D or 2-D, it must also have the ability to preserve the neighbor topology. Neighbor Topologies in Kohonen SOM There can be various topologies, however the following two topologies are used the most − Rectangular Grid Topology This topology has 24 nodes in the distance-2 grid, 16 nodes in the distance-1 grid, and 8 nodes in the distance-0 grid, which means the difference between each rectangular grid is 8 nodes. The winning unit is indicated by #. Hexagonal Grid Topology This topology has 18 nodes in the distance-2 grid, 12 nodes in the distance-1 grid, and 6 nodes in the distance-0 grid, which means the difference between each rectangular grid is 6 nodes. The winning unit is indicated by #. Architecture The architecture of KSOM is similar to that of the competitive network. With the help of neighborhood schemes, discussed earlier, the training can take place over the extended region of the network. Algorithm for training Step 1 − Initialize the weights, the learning rate α and the neighborhood topological scheme. Step 2 − Continue step 3-9, when the stopping condition is not true. Step 3 − Continue step 4-6 for every input vector x. Step 4 − Calculate Square of Euclidean Distance for j = 1 to m $$D(j):=:displaystylesumlimits_{i=1}^n displaystylesumlimits_{j=1}^m (x_{i}:-:w_{ij})^2$$ Step 5 − Obtain the winning unit J where D(j) is minimum. Step 6 − Calculate the new weight of the winning unit by the following relation − $$w_{ij}(new):=:w_{ij}(old):+:alpha[x_{i}:-:w_{ij}(old)]$$ Step 7 − Update the learning rate α by the following relation − $$alpha(t:+:1):=:0.5alpha t$$ Step 8 − Reduce the radius of topological scheme. Step 9 − Check for the stopping condition for the network. Learning working make money
Other Optimization Techniques Iterated Gradient Descent Technique Gradient descent, also known as the steepest descent, is an iterative optimization algorithm to find a local minimum of a function. While minimizing the function, we are concerned with the cost or error to be minimized (Remember Travelling Salesman Problem). It is extensively used in deep learning, which is useful in a wide variety of situations. The point here to be remembered is that we are concerned with local optimization and not global optimization. Main Working Idea We can understand the main working idea of gradient descent with the help of the following steps − First, start with an initial guess of the solution. Then, take the gradient of the function at that point. Later, repeat the process by stepping the solution in the negative direction of the gradient. By following the above steps, the algorithm will eventually converge where the gradient is zero. Mathematical Concept Suppose we have a function f(x) and we are trying to find the minimum of this function. Following are the steps to find the minimum of f(x). First, give some initial value $x_{0}:for:x$ Now take the gradient $nabla f$ of function, with the intuition that the gradient will give the slope of the curve at that x and its direction will point to the increase in the function, to find out the best direction to minimize it. Now change x as follows − $$x_{n:+:1}:=:x_{n}:-:theta nabla f(x_{n})$$ Here, θ > 0 is the training rate (step size) that forces the algorithm to take small jumps. Estimating Step Size Actually a wrong step size θ may not reach convergence, hence a careful selection of the same is very important. Following points must have to be remembered while choosing the step size Do not choose too large step size, otherwise it will have a negative impact, i.e. it will diverge rather than converge. Do not choose too small step size, otherwise it take a lot of time to converge. Some options with regards to choosing the step size − One option is to choose a fixed step size. Another option is to choose a different step size for every iteration. Simulated Annealing The basic concept of Simulated Annealing (SA) is motivated by the annealing in solids. In the process of annealing, if we heat a metal above its melting point and cool it down then the structural properties will depend upon the rate of cooling. We can also say that SA simulates the metallurgy process of annealing. Use in ANN SA is a stochastic computational method, inspired by Annealing analogy, for approximating the global optimization of a given function. We can use SA to train feed-forward neural networks. Algorithm Step 1 − Generate a random solution. Step 2 − Calculate its cost using some cost function. Step 3 − Generate a random neighboring solution. Step 4 − Calculate the new solution cost by the same cost function. Step 5 − Compare the cost of a new solution with that of an old solution as follows − If CostNew Solution < CostOld Solution then move to the new solution. Step 6 − Test for the stopping condition, which may be the maximum number of iterations reached or get an acceptable solution. Learning working make money
Learning Vector Quantization Learning Vector Quantization (LVQ), different from Vector quantization (VQ) and Kohonen Self-Organizing Maps (KSOM), basically is a competitive network which uses supervised learning. We may define it as a process of classifying the patterns where each output unit represents a class. As it uses supervised learning, the network will be given a set of training patterns with known classification along with an initial distribution of the output class. After completing the training process, LVQ will classify an input vector by assigning it to the same class as that of the output unit. Architecture Following figure shows the architecture of LVQ which is quite similar to the architecture of KSOM. As we can see, there are “n” number of input units and “m” number of output units. The layers are fully interconnected with having weights on them. Parameters Used Following are the parameters used in LVQ training process as well as in the flowchart x = training vector (x1,…,xi,…,xn) T = class for training vector x wj = weight vector for jth output unit Cj = class associated with the jth output unit Training Algorithm Step 1 − Initialize reference vectors, which can be done as follows − Step 1(a) − From the given set of training vectors, take the first “m” (number of clusters) training vectors and use them as weight vectors. The remaining vectors can be used for training. Step 1(b) − Assign the initial weight and classification randomly. Step 1(c) − Apply K-means clustering method. Step 2 − Initialize reference vector $alpha$ Step 3 − Continue with steps 4-9, if the condition for stopping this algorithm is not met. Step 4 − Follow steps 5-6 for every training input vector x. Step 5 − Calculate Square of Euclidean Distance for j = 1 to m and i = 1 to n $$D(j):=:displaystylesumlimits_{i=1}^ndisplaystylesumlimits_{j=1}^m (x_{i}:-:w_{ij})^2$$ Step 6 − Obtain the winning unit J where D(j) is minimum. Step 7 − Calculate the new weight of the winning unit by the following relation − if T = Cj then $w_{j}(new):=:w_{j}(old):+:alpha[x:-:w_{j}(old)]$ if T ≠ Cj then $w_{j}(new):=:w_{j}(old):-:alpha[x:-:w_{j}(old)]$ Step 8 − Reduce the learning rate $alpha$. Step 9 − Test for the stopping condition. It may be as follows − Maximum number of epochs reached. Learning rate reduced to a negligible value. Flowchart Variants Three other variants namely LVQ2, LVQ2.1 and LVQ3 have been developed by Kohonen. Complexity in all these three variants, due to the concept that the winner as well as the runner-up unit will learn, is more than in LVQ. LVQ2 As discussed, the concept of other variants of LVQ above, the condition of LVQ2 is formed by window. This window will be based on the following parameters − x − the current input vector yc − the reference vector closest to x yr − the other reference vector, which is next closest to x dc − the distance from x to yc dr − the distance from x to yr The input vector x falls in the window, if $$frac{d_{c}}{d_{r}}:>:1:-:theta::and::frac{d_{r}}{d_{c}}:>:1:+:theta$$ Here, $theta$ is the number of training samples. Updating can be done with the following formula − $y_{c}(t:+:1):=:y_{c}(t):+:alpha(t)[x(t):-:y_{c}(t)]$ (belongs to different class) $y_{r}(t:+:1):=:y_{r}(t):+:alpha(t)[x(t):-:y_{r}(t)]$ (belongs to same class) Here $alpha$ is the learning rate. LVQ2.1 In LVQ2.1, we will take the two closest vectors namely yc1 and yc2 and the condition for window is as follows − $$Minbegin{bmatrix}frac{d_{c1}}{d_{c2}},frac{d_{c2}}{d_{c1}}end{bmatrix}:>:(1:-:theta)$$ $$Maxbegin{bmatrix}frac{d_{c1}}{d_{c2}},frac{d_{c2}}{d_{c1}}end{bmatrix}: Updating can be done with the following formula − $y_{c1}(t:+:1):=:y_{c1}(t):+:alpha(t)[x(t):-:y_{c1}(t)]$ (belongs to different class) $y_{c2}(t:+:1):=:y_{c2}(t):+:alpha(t)[x(t):-:y_{c2}(t)]$ (belongs to same class) Here, $alpha$ is the learning rate. LVQ3 In LVQ3, we will take the two closest vectors namely yc1 and yc2 and the condition for window is as follows − $$Minbegin{bmatrix}frac{d_{c1}}{d_{c2}},frac{d_{c2}}{d_{c1}}end{bmatrix}:>:(1:-:theta)(1:+:theta)$$ Here $thetaapprox 0.2$ Updating can be done with the following formula − $y_{c1}(t:+:1):=:y_{c1}(t):+:beta(t)[x(t):-:y_{c1}(t)]$ (belongs to different class) $y_{c2}(t:+:1):=:y_{c2}(t):+:beta(t)[x(t):-:y_{c2}(t)]$ (belongs to same class) Here $beta$ is the multiple of the learning rate $alpha$ and $beta:=:m alpha(t)$ for every 0.1 < m < 0.5 Learning working make money