CNTK – Training the Neural Network Here, we will understand about training the Neural Network in CNTK. Training a model in CNTK In the previous section, we have defined all the components for the deep learning model. Now it is time to train it. As we discussed earlier, we can train a NN model in CNTK using the combination of learner and trainer. Choosing a learner and setting up training In this section, we will be defining the learner. CNTK provides several learners to choose from. For our model, defined in previous sections, we will be using Stochastic Gradient Descent (SGD) learner. In order to train the neural network, let us configure the learner and trainer with the help of following steps − Step 1 − First, we need to import sgd function from cntk.lerners package. from cntk.learners import sgd Step 2 − Next, we need to import Trainer function from cntk.train.trainer package. from cntk.train.trainer import Trainer Step 3 − Now, we need to create a learner. It can be created by invoking sgd function along with providing model’s parameters and a value for the learning rate. learner = sgd(z.parametrs, 0.01) Step 4 − At last, we need to initialize the trainer. It must be provided the network, the combination of the loss and metric along with the learner. trainer = Trainer(z, (loss, error_rate), [learner]) The learning rate which controls the speed of optimisation should be small number between 0.1 to 0.001. Choosing a learner and setting up the training – Complete example from cntk.learners import sgd from cntk.train.trainer import Trainer learner = sgd(z.parametrs, 0.01) trainer = Trainer(z, (loss, error_rate), [learner]) Feeding data into the trainer Once we chose and configured the trainer, it is time to load the dataset. We have saved the iris dataset as a .CSV file and we will be using data wrangling package named pandas to load the dataset. Steps to load the dataset from .CSV file Step 1 − First, we need to import the pandas package. from import pandas as pd Step 2 − Now, we need to invoke the function named read_csv function to load the .csv file from the disk. df_source = pd.read_csv(‘iris.csv’, names = [‘sepal_length’, ‘sepal_width’, ‘petal_length’, ‘petal_width’, index_col=False) Once we load the dataset, we need to split it into a set of features and a label. Steps to split the dataset into features and label Step 1 − First, we need to select all rows and first four columns from the dataset. It can be done by using iloc function. x = df_source.iloc[:, :4].values Step 2 − Next we need to select the species column from iris dataset. We will be using the values property to access the underlying numpy array. x = df_source[‘species’].values Steps to encode the species column to a numeric vector representation As we discussed earlier, our model is based on classification, it requires numeric input values. Hence, here we need to encode the species column to a numeric vector representation. Let’s see the steps to do it − Step 1 − First, we need to create a list expression to iterate over all elements in the array. Then perform a look up in the label_mapping dictionary for each value. label_mapping = {‘Iris-Setosa’ : 0, ‘Iris-Versicolor’ : 1, ‘Iris-Virginica’ : 2} Step 2 − Next, convert this converted numeric value to a one-hot encoded vector. We will be using one_hot function as follows − def one_hot(index, length): result = np.zeros(length) result[index] = 1 return result Step 3 − At last, we need to turn this converted list into a numpy array. y = np.array([one_hot(label_mapping[v], 3) for v in y]) Steps to detect overfitting The situation, when your model remembers samples but can’t deduce rules from the training samples, is overfitting. With the help of following steps, we can detect overfitting on our model − Step 1 − First, from sklearn package, import the train_test_split function from the model_selection module. from sklearn.model_selection import train_test_split Step 2 − Next, we need to invoke the train_test_split function with features x and labels y as follows − x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0-2, stratify=y) We specified a test_size of 0.2 to set aside 20% of total data. label_mapping = {‘Iris-Setosa’ : 0, ‘Iris-Versicolor’ : 1, ‘Iris-Virginica’ : 2} Steps to feed training set and validation set to our model Step 1 − In order to train our model, first, we will be invoking the train_minibatch method. Then give it a dictionary that maps the input data to the input variable that we have used to define the NN and its associated loss function. trainer.train_minibatch({ features: X_train, label: y_train}) Step 2 − Next, call train_minibatch by using the following for loop − for _epoch in range(10): trainer.train_minbatch ({ feature: X_train, label: y_train}) print(‘Loss: {}, Acc: {}’.format( trainer.previous_minibatch_loss_average, trainer.previous_minibatch_evaluation_average)) Feeding data into the trainer – Complete example from import pandas as pd df_source = pd.read_csv(‘iris.csv’, names = [‘sepal_length’, ‘sepal_width’, ‘petal_length’, ‘petal_width’, index_col=False) x = df_source.iloc[:, :4].values x = df_source[‘species’].values label_mapping = {‘Iris-Setosa’ : 0, ‘Iris-Versicolor’ : 1, ‘Iris-Virginica’ : 2} def one_hot(index, length): result = np.zeros(length) result[index] = 1 return result y = np.array([one_hot(label_mapping[v], 3) for v in y]) from sklearn.model_selection import train_test_split x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0-2, stratify=y) label_mapping = {‘Iris-Setosa’ : 0, ‘Iris-Versicolor’ : 1, ‘Iris-Virginica’ : 2} trainer.train_minibatch({ features: X_train, label: y_train}) for _epoch in range(10): trainer.train_minbatch ({ feature: X_train, label: y_train}) print(‘Loss: {}, Acc: {}’.format( trainer.previous_minibatch_loss_average, trainer.previous_minibatch_evaluation_average)) Measuring the performance of NN In order to optimise our NN model, whenever we pass data through the trainer, it measures the performance of the model through the metric that we configured for trainer. Such measurement of performance of NN model during training is on training data. But on the other hand, for a full analysis of the model performance we need to use test data as well. So, to measure the performance of the model using the test data, we can invoke the test_minibatch method on the trainer as follows − trainer.test_minibatch({ features: X_test,
Category: microsoft Cognitive Toolkit
CNTK – In-Memory and Large Datasets In this chapter, we will learn about how to work with the in-memory and large datasets in CNTK. Training with small in memory datasets When we talk about feeding data into CNTK trainer, there can be many ways, but it will depend upon the size of the dataset and format of the data. The data sets can be small in-memory or large datasets. In this section, we are going to work with in-memory datasets. For this, we will use the following two frameworks − Numpy Pandas Using Numpy arrays Here, we will work with a numpy based randomly generated dataset in CNTK. In this example, we are going to simulate data for a binary classification problem. Suppose, we have a set of observations with 4 features and want to predict two possible labels with our deep learning model. Implementation Example For this, first we must generate a set of labels containing a one-hot vector representation of the labels, we want to predict. It can be done with the help of following steps − Step 1 − Import the numpy package as follows − import numpy as np num_samples = 20000 Step 2 − Next, generate a label mapping by using np.eye function as follows − label_mapping = np.eye(2) Step 3 − Now by using np.random.choice function, collect the 20000 random samples as follows − y = label_mapping[np.random.choice(2,num_samples)].astype(np.float32) Step 4 − Now at last by using np.random.random function, generate an array of random floating point values as follows − x = np.random.random(size=(num_samples, 4)).astype(np.float32) Once, we generate an array of random floating-point values, we need to convert them to 32-bit floating point numbers so that it can be matched to the format expected by CNTK. Let’s follow the steps below to do this − Step 5 − Import the Dense and Sequential layer functions from cntk.layers module as follows − from cntk.layers import Dense, Sequential Step 6 − Now, we need to import the activation function for the layers in the network. Let us import the sigmoid as activation function − from cntk import input_variable, default_options from cntk.ops import sigmoid Step 7 − Now, we need to import the loss function to train the network. Let us import binary_cross_entropy as loss function − from cntk.losses import binary_cross_entropy Step 8 − Next, we need to define the default options for the network. Here, we will be providing the sigmoid activation function as a default setting. Also, create the model by using Sequential layer function as follows − with default_options(activation=sigmoid): model = Sequential([Dense(6),Dense(2)]) Step 9 − Next, initialise an input_variable with 4 input features serving as the input for the network. features = input_variable(4) Step 10 − Now, in order to complete it, we need to connect features variable to the NN. z = model(features) So, now we have a NN, with the help of following steps, let us train it using in-memory dataset − Step 11 − To train this NN, first we need to import learner from cntk.learners module. We will import sgd learner as follows − from cntk.learners import sgd Step 12 − Along with that import the ProgressPrinter from cntk.logging module as well. from cntk.logging import ProgressPrinter progress_writer = ProgressPrinter(0) Step 13 − Next, define a new input variable for the labels as follows − labels = input_variable(2) Step 14 − In order to train the NN model, next, we need to define a loss using the binary_cross_entropy function. Also, provide the model z and the labels variable. loss = binary_cross_entropy(z, labels) Step 15 − Next, initialize the sgd learner as follows − learner = sgd(z.parameters, lr=0.1) Step 16 − At last, call the train method on the loss function. Also, provide it with the input data, the sgd learner and the progress_printer.− training_summary=loss.train((x,y),parameter_learners=[learner],callbacks=[progress_writer]) Complete implementation example import numpy as np num_samples = 20000 label_mapping = np.eye(2) y = label_mapping[np.random.choice(2,num_samples)].astype(np.float32) x = np.random.random(size=(num_samples, 4)).astype(np.float32) from cntk.layers import Dense, Sequential from cntk import input_variable, default_options from cntk.ops import sigmoid from cntk.losses import binary_cross_entropy with default_options(activation=sigmoid): model = Sequential([Dense(6),Dense(2)]) features = input_variable(4) z = model(features) from cntk.learners import sgd from cntk.logging import ProgressPrinter progress_writer = ProgressPrinter(0) labels = input_variable(2) loss = binary_cross_entropy(z, labels) learner = sgd(z.parameters, lr=0.1) training_summary=loss.train((x,y),parameter_learners=[learner],callbacks=[progress_writer]) Output Build info: Built time: *** ** **** 21:40:10 Last modified date: *** *** ** 21:08:46 2019 Build type: Release Build target: CPU-only With ASGD: yes Math lib: mkl Build Branch: HEAD Build SHA1:ae9c9c7c5f9e6072cc9c94c254f816dbdc1c5be6 (modified) MPI distribution: Microsoft MPI MPI version: 7.0.12437.6 ——————————————————————- average since average since examples loss last metric last —————————————————— Learning rate per minibatch: 0.1 1.52 1.52 0 0 32 1.51 1.51 0 0 96 1.48 1.46 0 0 224 1.45 1.42 0 0 480 1.42 1.4 0 0 992 1.41 1.39 0 0 2016 1.4 1.39 0 0 4064 1.39 1.39 0 0 8160 1.39 1.39 0 0 16352 Using Pandas DataFrames Numpy arrays are very limited in what they can contain and one of the most basic ways of storing data. For example, a single n-dimensional array can contain data of a single data type. But on the other hand, for many real-world cases we need a library that can handle more than one data type in a single dataset. One of the Python libraries called Pandas makes it easier to work with such kind of datasets. It introduces the concept of a DataFrame (DF) and allows us to load datasets from disk stored in various formats as DFs. For example, we can read DFs stored as CSV, JSON, Excel, etc. You can learn Python Pandas library in more detail at Implementation Example In this example, we are going to use the example of classifying three possible species of the iris flowers based on four properties. We have created this deep learning model in the previous sections too. The model is as follows − from cntk.layers import Dense, Sequential from cntk import input_variable, default_options from cntk.ops import sigmoid, log_softmax from cntk.losses import binary_cross_entropy model = Sequential([ Dense(4, activation=sigmoid), Dense(3, activation=log_softmax) ]) features
CNTK – Neural Network Binary Classification Let us understand, what is neural network binary classification using CNTK, in this chapter. Binary classification using NN is like multi-class classification, the only thing is that there are just two output nodes instead of three or more. Here, we are going to perform binary classification using a neural network by using two techniques namely one-node and two-node technique. One-node technique is more common than two-node technique. Loading Dataset For both these techniques to implement using NN, we will be using banknote dataset. The dataset can be downloaded from UCI Machine Learning Repository which is available at For our example, we will be using 50 authentic data items having class forgery = 0, and the first 50 fake items having class forgery = 1. Preparing training & test files There are 1372 data items in the full dataset. The raw dataset looks as follows − 3.6216, 8.6661, -2.8076, -0.44699, 0 4.5459, 8.1674, -2.4586, -1.4621, 0 … -1.3971, 3.3191, -1.3927, -1.9948, 1 0.39012, -0.14279, -0.031994, 0.35084, 1 Now, first we need to convert this raw data into two-node CNTK format, which would be as follows − |stats 3.62160000 8.66610000 -2.80730000 -0.44699000 |forgery 0 1 |# authentic |stats 4.54590000 8.16740000 -2.45860000 -1.46210000 |forgery 0 1 |# authentic . . . |stats -1.39710000 3.31910000 -1.39270000 -1.99480000 |forgery 1 0 |# fake |stats 0.39012000 -0.14279000 -0.03199400 0.35084000 |forgery 1 0 |# fake You can use the following python program to create CNTK-format data from Raw data − fin = open(“.\…”, “r”) #provide the location of saved dataset text file. for line in fin: line = line.strip() tokens = line.split(“,”) if tokens[4] == “0”: print(“|stats %12.8f %12.8f %12.8f %12.8f |forgery 0 1 |# authentic” % (float(tokens[0]), float(tokens[1]), float(tokens[2]), float(tokens[3])) ) else: print(“|stats %12.8f %12.8f %12.8f %12.8f |forgery 1 0 |# fake” % (float(tokens[0]), float(tokens[1]), float(tokens[2]), float(tokens[3])) ) fin.close() Two-node binary Classification model There is very little difference between two-node classification and multi-class classification. Here we first, need to process the data files in CNTK format and for that we are going to use the helper function named create_reader as follows − def create_reader(path, input_dim, output_dim, rnd_order, sweeps): x_strm = C.io.StreamDef(field=”stats”, shape=input_dim, is_sparse=False) y_strm = C.io.StreamDef(field=”forgery”, shape=output_dim, is_sparse=False) streams = C.io.StreamDefs(x_src=x_strm, y_src=y_strm) deserial = C.io.CTFDeserializer(path, streams) mb_src = C.io.MinibatchSource(deserial, randomize=rnd_order, max_sweeps=sweeps) return mb_src Now, we need to set the architecture arguments for our NN and also provide the location of the data files. It can be done with the help of following python code − def main(): print(“Using CNTK version = ” + str(C.__version__) + “n”) input_dim = 4 hidden_dim = 10 output_dim = 2 train_file = “.\…\” #provide the name of the training file test_file = “.\…\” #provide the name of the test file Now, with the help of following code line our program will create the untrained NN − X = C.ops.input_variable(input_dim, np.float32) Y = C.ops.input_variable(output_dim, np.float32) with C.layers.default_options(init=C.initializer.uniform(scale=0.01, seed=1)): hLayer = C.layers.Dense(hidden_dim, activation=C.ops.tanh, name=”hidLayer”)(X) oLayer = C.layers.Dense(output_dim, activation=None, name=”outLayer”)(hLayer) nnet = oLayer model = C.ops.softmax(nnet) Now, once we created the dual untrained model, we need to set up a Learner algorithm object and afterwards use it to create a Trainer training object. We are going to use SGD learner and cross_entropy_with_softmax loss function − tr_loss = C.cross_entropy_with_softmax(nnet, Y) tr_clas = C.classification_error(nnet, Y) max_iter = 500 batch_size = 10 learn_rate = 0.01 learner = C.sgd(nnet.parameters, learn_rate) trainer = C.Trainer(nnet, (tr_loss, tr_clas), [learner]) Now, once we finished with Trainer object, we need to create a reader function to read the training data − rdr = create_reader(train_file, input_dim, output_dim, rnd_order=True, sweeps=C.io.INFINITELY_REPEAT) banknote_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src } Now, it is time to train our NN model − for i in range(0, max_iter): curr_batch = rdr.next_minibatch(batch_size, input_map=iris_input_map) trainer.train_minibatch(curr_batch) if i % 500 == 0: mcee = trainer.previous_minibatch_loss_average macc = (1.0 – trainer.previous_minibatch_evaluation_average) * 100 print(“batch %4d: mean loss = %0.4f, accuracy = %0.2f%% ” % (i, mcee, macc)) Once training is completed, let us evaluate the model using test data items − print(“nEvaluating test data n”) rdr = create_reader(test_file, input_dim, output_dim, rnd_order=False, sweeps=1) banknote_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src } num_test = 20 all_test = rdr.next_minibatch(num_test, input_map=iris_input_map) acc = (1.0 – trainer.test_minibatch(all_test)) * 100 print(“Classification accuracy = %0.2f%%” % acc) After evaluating the accuracy of our trained NN model, we will be using it for making a prediction on unseen data − np.set_printoptions(precision = 1, suppress=True) unknown = np.array([[0.6, 1.9, -3.3, -0.3]], dtype=np.float32) print(“nPredicting Banknote authenticity for input features: “) print(unknown[0]) pred_prob = model.eval(unknown) np.set_printoptions(precision = 4, suppress=True) print(“Prediction probabilities are: “) print(pred_prob[0]) if pred_prob[0,0] Complete Two-node Classification Model def create_reader(path, input_dim, output_dim, rnd_order, sweeps): x_strm = C.io.StreamDef(field=”stats”, shape=input_dim, is_sparse=False) y_strm = C.io.StreamDef(field=”forgery”, shape=output_dim, is_sparse=False) streams = C.io.StreamDefs(x_src=x_strm, y_src=y_strm) deserial = C.io.CTFDeserializer(path, streams) mb_src = C.io.MinibatchSource(deserial, randomize=rnd_order, max_sweeps=sweeps) return mb_src def main(): print(“Using CNTK version = ” + str(C.__version__) + “n”) input_dim = 4 hidden_dim = 10 output_dim = 2 train_file = “.\…\” #provide the name of the training file test_file = “.\…\” #provide the name of the test file X = C.ops.input_variable(input_dim, np.float32) Y = C.ops.input_variable(output_dim, np.float32) withC.layers.default_options(init=C.initializer.uniform(scale=0.01, seed=1)): hLayer = C.layers.Dense(hidden_dim, activation=C.ops.tanh, name=”hidLayer”)(X) oLayer = C.layers.Dense(output_dim, activation=None, name=”outLayer”)(hLayer) nnet = oLayer model = C.ops.softmax(nnet) tr_loss = C.cross_entropy_with_softmax(nnet, Y) tr_clas = C.classification_error(nnet, Y) max_iter = 500 batch_size = 10 learn_rate = 0.01 learner = C.sgd(nnet.parameters, learn_rate) trainer = C.Trainer(nnet, (tr_loss, tr_clas), [learner]) rdr = create_reader(train_file, input_dim, output_dim, rnd_order=True, sweeps=C.io.INFINITELY_REPEAT) banknote_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src } for i in range(0, max_iter): curr_batch = rdr.next_minibatch(batch_size, input_map=iris_input_map) trainer.train_minibatch(curr_batch) if i % 500 == 0: mcee = trainer.previous_minibatch_loss_average macc = (1.0 – trainer.previous_minibatch_evaluation_average) * 100 print(“batch %4d: mean loss = %0.4f, accuracy = %0.2f%% ” % (i, mcee, macc)) print(“nEvaluating test data n”) rdr = create_reader(test_file, input_dim, output_dim, rnd_order=False, sweeps=1) banknote_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src } num_test = 20 all_test = rdr.next_minibatch(num_test, input_map=iris_input_map) acc = (1.0 – trainer.test_minibatch(all_test)) * 100 print(“Classification accuracy = %0.2f%%” % acc) np.set_printoptions(precision = 1, suppress=True) unknown = np.array([[0.6, 1.9, -3.3, -0.3]], dtype=np.float32) print(“nPredicting
Microsoft Cognitive Toolkit (CNTK) – Introduction In this chapter, we will learn what is CNTK, its features, difference between its version 1.0 and 2.0 and important highlights of version 2.7. What is Microsoft Cognitive Toolkit (CNTK)? Microsoft Cognitive Toolkit (CNTK), formerly known as Computational Network Toolkit, is a free, easy-to-use, open-source, commercial-grade toolkit that enables us to train deep learning algorithms to learn like the human brain. It enables us to create some popular deep learning systems like feed-forward neural network time series prediction systems and Convolutional neural network (CNN) image classifiers. For optimal performance, its framework functions are written in C++. Although we can call its function using C++, but the most commonly used approach for the same is to use a Python program. CNTK’s Features Following are some of the features and capabilities offered in the latest version of Microsoft CNTK: Built-in components CNTK has highly optimised built-in components that can handle multi-dimensional dense or sparse data from Python, C++ or BrainScript. We can implement CNN, FNN, RNN, Batch Normalisation and Sequence-to-Sequence with attention. It provides us the functionality to add new user-defined core-components on the GPU from Python. It also provides automatic hyperparameter tuning. We can implement Reinforcement learning, Generative Adversarial Networks (GANs), Supervised as well as Unsupervised learning. For massive datasets, CNTK has built-in optimised readers. Usage of resources efficiently CNTK provides us parallelism with high accuracy on multiple GPUs/machines via 1-bit SGD. To fit the largest models in GPU memory, it provides memory sharing and other built-in methods. Express our own networks easily CNTK has full APIs for defining your own network, learners, readers, training and evaluation from Python, C++, and BrainScript. Using CNTK, we can easily evaluate models with Python, C++, C# or BrainScript. It provides both high-level as well as low-level APIs. Based on our data, it can automatically shape the inference. It has fully optimised symbolic Recurrent Neural Network (RNN) loops. Measuring model performance CNTK provides various components to measure the performance of neural networks you build. Generates log data from your model and the associated optimiser, which we can use to monitor the training process. Version 1.0 vs Version 2.0 Following table compares CNTK Version 1.0 and 2.0: Version 1.0 Version 2.0 It was released in 2016. It is a significant rewrite of the 1.0 Version and was released in June 2017. It used a proprietary scripting language called BrainScript. Its framework functions can be called using C++, Python. We can easily load our modules in C# or Java. BrainScript is also supported by Version 2.0. It runs on both Windows and Linux systems but not directly on Mac OS. It also runs on both Windows (Win 8.1, Win 10, Server 2012 R2 and later) and Linux systems but not directly on Mac OS. Important Highlights of Version 2.7 Version 2.7 is the last main released version of Microsoft Cognitive Toolkit. It has full support for ONNX 1.4.1. Following are some important highlights of this last released version of CNTK. Full support for ONNX 1.4.1. Support for CUDA 10 for both Windows and Linux systems. It supports advance Recurrent Neural Networks (RNN) loop in ONNX export. It can export more than 2GB models in ONNX format. It supports FP16 in BrainScript scripting language’s training action.
CNTK – Logistic Regression Model This chapter deals with constructing a logistic regression model in CNTK. Basics of Logistic Regression model Logistic Regression, one of the simplest ML techniques, is a technique especially for binary classification. In other words, to create a prediction model in situations where the value of the variable to predict can be one of just two categorical values. One of the simplest examples of Logistic Regression is to predict whether the person is male or female, based on person’s age, voice, hairs and so on. Example Let’s understand the concept of Logistic Regression mathematically with the help of another example − Suppose, we want to predict the credit worthiness of a loan application; 0 means reject, and 1 means approve, based on applicant debt , income and credit rating. We represent debt with X1, income with X2 and credit rating with X3. In Logistic Regression, we determine a weight value, represented by w, for every feature and a single bias value, represented by b. Now suppose, X1 = 3.0 X2 = -2.0 X3 = 1.0 And suppose we determine weight and bias as follows − W1 = 0.65, W2 = 1.75, W3 = 2.05 and b = 0.33 Now, for predicting the class, we need to apply the following formula − Z = (X1*W1)+(X2*W2)+(X3+W3)+b i.e. Z = (3.0)*(0.65) + (-2.0)*(1.75) + (1.0)*(2.05) + 0.33 = 0.83 Next, we need to compute P = 1.0/(1.0 + exp(-Z)). Here, the exp() function is Euler’s number. P = 1.0/(1.0 + exp(-0.83) = 0.6963 The P value can be interpreted as the probability that the class is 1. If P = 0.5) is class = 1. To determine the values of weight and bias, we must obtain a set of training data having the known input predictor values and known correct class labels values. After that, we can use an algorithm, generally Gradient Descent, in order to find the values of weight and bias. LR model implementation example For this LR model, we are going to use the following data set − 1.0, 2.0, 0 3.0, 4.0, 0 5.0, 2.0, 0 6.0, 3.0, 0 8.0, 1.0, 0 9.0, 2.0, 0 1.0, 4.0, 1 2.0, 5.0, 1 4.0, 6.0, 1 6.0, 5.0, 1 7.0, 3.0, 1 8.0, 5.0, 1 To start this LR model implementation in CNTK, we need to first import the following packages − import numpy as np import cntk as C The program is structured with main() function as follows − def main(): print(“Using CNTK version = ” + str(C.__version__) + “n”) Now, we need to load the training data into memory as follows − data_file = “.\dataLRmodel.txt” print(“Loading data from ” + data_file + “n”) features_mat = np.loadtxt(data_file, dtype=np.float32, delimiter=”,”, skiprows=0, usecols=[0,1]) labels_mat = np.loadtxt(data_file, dtype=np.float32, delimiter=”,”, skiprows=0, usecols=[2], ndmin=2) Now, we will be creating a training program that creates a logistic regression model which is compatible with the training data − features_dim = 2 labels_dim = 1 X = C.ops.input_variable(features_dim, np.float32) y = C.input_variable(labels_dim, np.float32) W = C.parameter(shape=(features_dim, 1)) # trainable cntk.Parameter b = C.parameter(shape=(labels_dim)) z = C.times(X, W) + b p = 1.0 / (1.0 + C.exp(-z)) model = p Now, we need to create Lerner and trainer as follows − ce_error = C.binary_cross_entropy(model, y) # CE a bit more principled for LR fixed_lr = 0.010 learner = C.sgd(model.parameters, fixed_lr) trainer = C.Trainer(model, (ce_error), [learner]) max_iterations = 4000 LR Model training Once, we have created the LR model, next, it is time to start the training process − np.random.seed(4) N = len(features_mat) for i in range(0, max_iterations): row = np.random.choice(N,1) # pick a random row from training items trainer.train_minibatch({ X: features_mat[row], y: labels_mat[row] }) if i % 1000 == 0 and i > 0: mcee = trainer.previous_minibatch_loss_average print(str(i) + ” Cross-entropy error on curr item = %0.4f ” % mcee) Now, with the help of the following code, we can print the model weights and bias − np.set_printoptions(precision=4, suppress=True) print(“Model weights: “) print(W.value) print(“Model bias:”) print(b.value) print(“”) if __name__ == “__main__”: main() Training a Logistic Regression model – Complete example import numpy as np import cntk as C def main(): print(“Using CNTK version = ” + str(C.__version__) + “n”) data_file = “.\dataLRmodel.txt” # provide the name and the location of data file print(“Loading data from ” + data_file + “n”) features_mat = np.loadtxt(data_file, dtype=np.float32, delimiter=”,”, skiprows=0, usecols=[0,1]) labels_mat = np.loadtxt(data_file, dtype=np.float32, delimiter=”,”, skiprows=0, usecols=[2], ndmin=2) features_dim = 2 labels_dim = 1 X = C.ops.input_variable(features_dim, np.float32) y = C.input_variable(labels_dim, np.float32) W = C.parameter(shape=(features_dim, 1)) # trainable cntk.Parameter b = C.parameter(shape=(labels_dim)) z = C.times(X, W) + b p = 1.0 / (1.0 + C.exp(-z)) model = p ce_error = C.binary_cross_entropy(model, y) # CE a bit more principled for LR fixed_lr = 0.010 learner = C.sgd(model.parameters, fixed_lr) trainer = C.Trainer(model, (ce_error), [learner]) max_iterations = 4000 np.random.seed(4) N = len(features_mat) for i in range(0, max_iterations): row = np.random.choice(N,1) # pick a random row from training items trainer.train_minibatch({ X: features_mat[row], y: labels_mat[row] }) if i % 1000 == 0 and i > 0: mcee = trainer.previous_minibatch_loss_average print(str(i) + ” Cross-entropy error on curr item = %0.4f ” % mcee) np.set_printoptions(precision=4, suppress=True) print(“Model weights: “) print(W.value) print(“Model bias:”) print(b.value) if __name__ == “__main__”: main() Output Using CNTK version = 2.7 1000 cross entropy error on curr item = 0.1941 2000 cross entropy error on curr item = 0.1746 3000 cross entropy error on curr item = 0.0563 Model weights: [-0.2049] [0.9666]] Model bias: [-2.2846] Prediction using trained LR Model Once the LR model has been trained, we can use it for prediction as follows − First of all, our evaluation program imports the numpy package and loads the training data into a feature matrix and a class label matrix in the same way as the training program we implement above − import numpy as np def main(): data_file = “.\dataLRmodel.txt” # provide the name and the location of data file features_mat = np.loadtxt(data_file, dtype=np.float32, delimiter=”,”, skiprows=0, usecols=(0,1)) labels_mat = np.loadtxt(data_file, dtype=np.float32, delimiter=”,”, skiprows=0, usecols=[2], ndmin=2) Next, it is time to
CNTK – Creating First Neural Network This chapter will elaborate on creating a neural network in CNTK. Build the network structure In order to apply CNTK concepts to build our first NN, we are going to use NN to classify species of iris flowers based on the physical properties of sepal width and length, and petal width and length. The dataset which we will be using iris dataset that describes the physical properties of different varieties of iris flowers − Sepal length Sepal width Petal length Petal width Class i.e. iris setosa or iris versicolor or iris virginica Here, we will be building a regular NN called a feedforward NN. Let us see the implementation steps to build the structure of NN − Step 1 − First, we will import the necessary components such as our layer types, activation functions, and a function that allows us to define an input variable for our NN, from CNTK library. from cntk import default_options, input_variable from cntk.layers import Dense, Sequential from cntk.ops import log_softmax, relu Step 2 − After that, we will create our model using sequential function. Once created, we will feed it with the layers we want. Here, we are going to create two distinct layers in our NN; one with four neurons and another with three neurons. model = Sequential([Dense(4, activation=relu), Dense(3, activation=log_sogtmax)]) Step 3 − At last, in order to compile the NN, we will bind the network to the input variable. It has an input layer with four neurons and an output layer with three neurons. feature= input_variable(4) z = model(feature) Applying an activation function There are lots of activation functions to choose from and choosing the right activation function will definitely make a big difference to how well our deep learning model will perform. At the output layer Choosing an activation function at the output layer will depend upon the kind of problem we are going to solve with our model. For a regression problem, we should use a linear activation function on the output layer. For a binary classification problem, we should use a sigmoid activation function on the output layer. For multi-class classification problem, we should use a softmax activation function on the output layer. Here, we are going to build a model for predicting one of the three classes. It means we need to use softmax activation function at output layer. At the hidden layer Choosing an activation function at the hidden layer requires some experimentation for monitoring the performance to see which activation function works well. In a classification problem, we need to predict the probability a sample belongs to a specific class. That’s why we need an activation function that gives us probabilistic values. To reach this goal, sigmoid activation function can help us. One of the major problems associated with sigmoid function is vanishing gradient problem. To overcome such problem, we can use ReLU activation function that coverts all negative values to zero and works as a pass-through filter for positive values. Picking a loss function Once, we have the structure for our NN model, we must have to optimise it. For optimising we need a loss function. Unlike activation functions, we have very less loss functions to choose from. However, choosing a loss function will depend upon the kind of problem we are going to solve with our model. For example, in a classification problem, we should use a loss function that can measure the difference between a predicted class and an actual class. loss function For the classification problem, we are going to solve with our NN model, categorical cross entropy loss function is the best candidate. In CNTK, it is implemented as cross_entropy_with_softmax which can be imported from cntk.losses package, as follows− label= input_variable(3) loss = cross_entropy_with_softmax(z, label) Metrics With having the structure for our NN model and a loss function to apply, we have all the ingredients to start making the recipe for optimising our deep learning model. But, before getting deep dive into this, we should learn about metrics. cntk.metrics CNTK has the package named cntk.metrics from which we can import the metrics we are going to use. As we are building a classification model, we will be using classification_error matric that will produce a number between 0 and 1. The number between 0 and 1 indicates the percentage of samples correctly predicted − First, we need to import the metric from cntk.metrics package − from cntk.metrics import classification_error error_rate = classification_error(z, label) The above function actually needs the output of the NN and the expected label as input.
CNTK – Neural Network (NN) Concepts This chapter deals with concepts of Neural Network with regards to CNTK. As we know that, several layers of neurons are used for making a neural network. But, the question arises that in CNTK how we can model the layers of a NN? It can be done with the help of layer functions defined in the layer module. Layer function Actually, in CNTK, working with the layers has a distinct functional programming feel to it. Layer function looks like a regular function and it produces a mathematical function with a set of predefined parameters. Let’s see how we can create the most basic layer type, Dense, with the help of layer function. Example With the help of following basic steps, we can create the most basic layer type − Step 1 − First, we need to import the Dense layer function from the layers’ package of CNTK. from cntk.layers import Dense Step 2 − Next from the CNTK root package, we need to import the input_variable function. from cntk import input_variable Step 3 − Now, we need to create a new input variable using the input_variable function. We also need to provide the its size. feature = input_variable(100) Step 4 − At last, we will create a new layer using Dense function along with providing the number of neurons we want. layer = Dense(40)(feature) Now, we can invoke the configured Dense layer function to connect the Dense layer to the input. Complete implementation example from cntk.layers import Dense from cntk import input_variable feature= input_variable(100) layer = Dense(40)(feature) Customizing layers As we have seen CNTK provides us with a pretty good set of defaults for building NNs. Based on activation function and other settings we choose, the behavior as well as performance of the NN is different. It is another very useful stemming algorithm. That’s the reason, it is good to understand what we can configure. Steps to configure a Dense layer Each layer in NN has its unique configuration options and when we talk about Dense layer, we have following important settings to define − shape − As name implies, it defines the output shape of the layer which further determines the number of neurons in that layer. activation − It defines the activation function of that layer, so it can transform the input data. init − It defines the initialisation function of that layer. It will initialise the parameters of the layer when we start training the NN. Let’s see the steps with the help of which we can configure a Dense layer − Step1 − First, we need to import the Dense layer function from the layers’ package of CNTK. from cntk.layers import Dense Step2 − Next from the CNTK ops package, we need to import the sigmoid operator. It will be used to configure as an activation function. from cntk.ops import sigmoid Step3 − Now, from initializer package, we need to import the glorot_uniform initializer. from cntk.initializer import glorot_uniform Step4 − At last, we will create a new layer using Dense function along with providing the number of neurons as the first argument. Also, provide the sigmoid operator as activation function and the glorot_uniform as the init function for the layer. layer = Dense(50, activation = sigmoid, init = glorot_uniform) Complete implementation example − from cntk.layers import Dense from cntk.ops import sigmoid from cntk.initializer import glorot_uniform layer = Dense(50, activation = sigmoid, init = glorot_uniform) Optimizing the parameters Till now, we have seen how to create the structure of a NN and how to configure various settings. Here, we will see, how we can optimise the parameters of a NN. With the help of the combination of two components namely learners and trainers, we can optimise the parameters of a NN. trainer component The first component which is used to optimise the parameters of a NN is trainer component. It basically implements the backpropagation process. If we talk about its working, it passes the data through the NN to obtain a prediction. After that, it uses another component called learner in order to obtain the new values for the parameters in a NN. Once it obtains the new values, it applies these new values and repeat the process until an exit criterion is met. learner component The second component which is used to optimise the parameters of a NN is learner component, which is basically responsible for performing the gradient descent algorithm. Learners included in the CNTK library Following is the list of some of the interesting learners included in CNTK library − Stochastic Gradient Descent (SGD) − This learner represents the basic stochastic gradient descent, without any extras. Momentum Stochastic Gradient Descent (MomentumSGD) − With SGD, this learner applies the momentum to overcome the problem of local maxima. RMSProp − This learner, in order to control the rate of descent, uses decaying learning rates. Adam − This learner, in order to decrease the rate of descent over time, uses decaying momentum. Adagrad − This learner, for frequently as well as infrequently occurring features, uses different learning rates.
Discuss Microsoft Cognitive Toolkit Microsoft Cognitive Toolkit (CNTK), formerly known as Computational Network Toolkit, is a free, easy-to-use, open-source, commercial-grade toolkit that enable us to train deep learning algorithms to learn like the human brain. It enables us to create some popular deep learning systems like feed-forward neural network time series prediction systems and Convolutional neural network (CNN) image classifiers.
Microsoft Cognitive Toolkit(CNTK) Tutorial Job Search Microsoft Cognitive Toolkit (CNTK), formerly known as Computational Network Toolkit, is a free, easy-to-use, open-source, commercial-grade toolkit that enable us to train deep learning algorithms to learn like the human brain. It enables us to create some popular deep learning systems like feed-forward neural network time series prediction systems and Convolutional neural network (CNN) image classifiers. Audience This tutorial will be useful for graduates, post-graduates, and research students who either have an interest in Deep learning or Artificial Neural Networks or have this subject as a part of their curriculum. The reader can be a beginner or an advanced learner. Prerequisites The reader must have basic knowledge about Neural Networks. He/she should also be aware about basic terminologies used in Python programming concepts.
Microsoft Cognitive Toolkit (CNTK) – CPU and GPU Microsoft Cognitive Toolkit offers two different build versions namely CPU-only and GPU-only. CPU only build version The CPU-only build version of CNTK uses the optimised Intel MKLML, where MKLML is the subset of MKL (Math Kernel Library) and released with Intel MKL-DNN as a terminated version of Intel MKL for MKL-DNN. GPU only build version On the other hand, the GPU-only build version of CNTK uses highly optimised NVIDIA libraries such as CUB and cuDNN. It supports distributed training across multiple GPUs and multiple machines. For even faster distributed training in CNTK, the GPU-build version also includes − MSR-developed 1bit-quantized SGD. Block-momentum SGD parallel training algorithms. Enabling GPU with CNTK on Windows In the previous section, we saw how to install the basic version of CNTK to use with the CPU. Now let’s discuss how we can install CNTK to use with a GPU. But, before getting deep dive into it, first you should have a supported graphics card. At present, CNTK supports the NVIDIA graphics card with at least CUDA 3.0 support. To make sure, you can check at whether your GPU supports CUDA. So, let us see the steps to enable GPU with CNTK on Windows OS − Step 1 − Depending on the graphics card you are using, first you need to have the latest GeForce or Quadro drivers for your graphics card. Step 2 − Once you downloaded the drivers, you need to install the CUDA toolkit Version 9.0 for Windows from NVIDIA website . After installing, run the installer and follow the instructions. Step 3 − Next, you need to install cuDNN binaries from NVIDIA website . With CUDA 9.0 version, cuDNN 7.4.1 works well. Basically, cuDNN is a layer on the top of CUDA, used by CNTK. Step 4 − After downloading the cuDNN binaries, you need to extract the zip file into the root folder of your CUDA toolkit installation. Step 5 − This is the last step which will enable GPU usage inside CNTK. Execute the following command inside the Anaconda prompt on Windows OS − pip install cntk-gpu Enabling GPU with CNTK on Linux Let us see how we can enable GPU with CNTK on Linux OS − Downloading the CUDA toolkit First, you need to install the CUDA toolkit from NVIDIA website . Running the installer Now, once you have binaries on the disk, run the installer by opening a terminal and executing the following command and the instruction on screen − sh cuda_9.0.176_384.81_linux-run Modify Bash profile script After installing CUDA toolkit on your Linux machine, you need to modify the BASH profile script. For this, first open the $HOME/ .bashrc file in text editor. Now, at the end of the script, include the following lines − export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}} export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64 ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} Installing Installing cuDNN libraries At last we need to install cuDNN binaries. It can be downloaded from NVIDIA website . With CUDA 9.0 version, cuDNN 7.4.1 works well. Basically, cuDNN is a layer on the top of CUDA, used by CNTK. Once downloaded the version for Linux, extract it to the /usr/local/cuda-9.0 folder by using the following command − tar xvzf -C /usr/local/cuda-9.0/ cudnn-9.0-linux-x64-v7.4.1.5.tgz Change the path to the filename as required.