Apache MXNet – KVStore and Visualization This chapter deals with the python packages KVStore and visualization. KVStore package KV stores stands for Key-Value store. It is critical component used for multi-device training. It is important because, the communication of parameters across devices on single as well as across multiple machines is transmitted through one or more servers with a KVStore for the parameters. Let us understand the working of KVStore with the help of following points: Each value in KVStore is represented by a key and a value. Each parameter array in the network is assigned a key and the weights of that parameter array is referred by value. After that, the worker nodes push gradients after processing a batch. They also pull updated weights before processing a new batch. In simple words, we can say that KVStore is a place for data sharing where, each device can push data in and pull data out. Data Push-In and Pull-Out KVStore can be thought of as single object shared across different devices such as GPUs & computers, where each device is able to push data in and pull data out. Following are the implementation steps that needs to be followed by devices to push data in and pull data out: Implementation steps Initialisation − First step is to initialise the values. Here for our example, we will be initialising a pair (int, NDArray) pair into KVStrore and after that pulling the values out − import mxnet as mx kv = mx.kv.create(”local”) # create a local KVStore. shape = (3,3) kv.init(3, mx.nd.ones(shape)*2) a = mx.nd.zeros(shape) kv.pull(3, out = a) print(a.asnumpy()) Output This produces the following output − [[2. 2. 2.] [2. 2. 2.] [2. 2. 2.]] Push, Aggregate, and Update − Once initialised, we can push a new value into KVStore with the same shape to the key − kv.push(3, mx.nd.ones(shape)*8) kv.pull(3, out = a) print(a.asnumpy()) Output The output is given below − [[8. 8. 8.] [8. 8. 8.] [8. 8. 8.]] The data used for pushing can be stored on any device such as GPUs or computers. We can also push multiple values into the same key. In this case, the KVStore will first sum all of these values and then push the aggregated value as follows − contexts = [mx.cpu(i) for i in range(4)] b = [mx.nd.ones(shape, ctx) for ctx in contexts] kv.push(3, b) kv.pull(3, out = a) print(a.asnumpy()) Output You will see the following output − [[4. 4. 4.] [4. 4. 4.] [4. 4. 4.]] For each push you applied, KVStore will combine the pushed value with the value already stored. It will be done with the help of an updater. Here, the default updater is ASSIGN. def update(key, input, stored): print(“update on key: %d” % key) stored += input * 2 kv.set_updater(update) kv.pull(3, out=a) print(a.asnumpy()) Output When you execute the above code, you should see the following output − [[4. 4. 4.] [4. 4. 4.] [4. 4. 4.]] Example kv.push(3, mx.nd.ones(shape)) kv.pull(3, out=a) print(a.asnumpy()) Output Given below is the output of the code − update on key: 3 [[6. 6. 6.] [6. 6. 6.] [6. 6. 6.]] Pull − As like Push, we can also pull the value onto several devices with a single call as follows − b = [mx.nd.ones(shape, ctx) for ctx in contexts] kv.pull(3, out = b) print(b[1].asnumpy()) Output The output is stated below − [[6. 6. 6.] [6. 6. 6.] [6. 6. 6.]] Complete Implementation Example Given below is the complete implementation example − import mxnet as mx kv = mx.kv.create(”local”) shape = (3,3) kv.init(3, mx.nd.ones(shape)*2) a = mx.nd.zeros(shape) kv.pull(3, out = a) print(a.asnumpy()) kv.push(3, mx.nd.ones(shape)*8) kv.pull(3, out = a) # pull out the value print(a.asnumpy()) contexts = [mx.cpu(i) for i in range(4)] b = [mx.nd.ones(shape, ctx) for ctx in contexts] kv.push(3, b) kv.pull(3, out = a) print(a.asnumpy()) def update(key, input, stored): print(“update on key: %d” % key) stored += input * 2 kv._set_updater(update) kv.pull(3, out=a) print(a.asnumpy()) kv.push(3, mx.nd.ones(shape)) kv.pull(3, out=a) print(a.asnumpy()) b = [mx.nd.ones(shape, ctx) for ctx in contexts] kv.pull(3, out = b) print(b[1].asnumpy()) Handling Key-Value Pairs All the operations we have implemented above involves a single key, but KVStore also provides an interface for a list of key-value pairs − For a single device Following is an example to show an KVStore interface for a list of key-value pairs for a single device − keys = [5, 7, 9] kv.init(keys, [mx.nd.ones(shape)]*len(keys)) kv.push(keys, [mx.nd.ones(shape)]*len(keys)) b = [mx.nd.zeros(shape)]*len(keys) kv.pull(keys, out = b) print(b[1].asnumpy()) Output You will receive the following output − update on key: 5 update on key: 7 update on key: 9 [[3. 3. 3.] [3. 3. 3.] [3. 3. 3.]] For multiple device Following is an example to show an KVStore interface for a list of key-value pairs for multiple device − b = [[mx.nd.ones(shape, ctx) for ctx in contexts]] * len(keys) kv.push(keys, b) kv.pull(keys, out = b) print(b[1][1].asnumpy()) Output You will see the following output − update on key: 5 update on key: 7 update on key: 9 [[11. 11. 11.] [11. 11. 11.] [11. 11. 11.]] Visualization package Visualization package is Apache MXNet package used to represents the neural network (NN) as a computation graph that consists of nodes and edges. Visualize neural network In the example below we will use mx.viz.plot_network to visualize neural network. Followings are the prerequisites for this − Prerequisites Jupyter notebook Graphviz library Implementation Example In the example below we will visualize a sample NN for linear matrix factorisation − import mxnet as mx user = mx.symbol.Variable(”user”) item = mx.symbol.Variable(”item”) score = mx.symbol.Variable(”score”) # Set the dummy dimensions k = 64 max_user = 100 max_item = 50 # The user feature lookup user = mx.symbol.Embedding(data = user, input_dim = max_user, output_dim = k) # The item feature lookup item = mx.symbol.Embedding(data = item, input_dim = max_item, output_dim = k) # predict by the inner product and then do sum N_net = user * item N_net = mx.symbol.sum_axis(data = N_net, axis = 1) N_net = mx.symbol.Flatten(data = N_net) # Defining the loss layer N_net =
Category: Apache Mxnet
Apache MXNet – Python API gluon As we have already discussed in previous chapters that, MXNet Gluon provides a clear, concise, and simple API for DL projects. It enables Apache MXNet to prototype, build, and train DL models without forfeiting the training speed. Core Modules Let us learn the core modules of Apache MXNet Python application programming interface (API) gluon. gluon.nn Gluon provides a large number of build-in NN layers in gluon.nn module. That is the reason it is called the core module. Methods and their parameters Following are some of the important methods and their parameters covered by mxnet.gluon.nn core module − Methods and its Parameters Definition Activation(activation, **kwargs) As name implies, this method applies an activation function to input. AvgPool1D([pool_size, strides, padding, …]) This is average pooling operation for temporal data. AvgPool2D([pool_size, strides, padding, …]) This is average pooling operation for spatial data. AvgPool3D([pool_size, strides, padding, …]) This is Average pooling operation for 3D data. The data can be spatial or spatio-temporal. BatchNorm([axis, momentum, epsilon, center, …]) It represents batch normalisation layer. BatchNormReLU([axis, momentum, epsilon, …]) It also represents batch normalisation layer but with Relu activation function. Block([prefix, params]) It gives the base class for all neural network layers and models. Conv1D(channels, kernel_size[, strides, …]) This method is used for 1-D convolution layer. For example, temporal convolution. Conv1DTranspose(channels, kernel_size[, …]) This method is used for Transposed 1D convolution layer. Conv2D(channels, kernel_size[, strides, …]) This method is used for 2D convolution layer. For example, spatial convolution over images). Conv2DTranspose(channels, kernel_size[, …]) This method is used for Transposed 2D convolution layer. Conv3D(channels, kernel_size[, strides, …]) This method is used for 3D convolution layer. For example, spatial convolution over volumes. Conv3DTranspose(channels, kernel_size[, …]) This method is used for Transposed 3D convolution layer. Dense(units[, activation, use_bias, …]) This method represents for your regular densely-connected NN layer. Dropout(rate[, axes]) As name implies, the method applies Dropout to the input. ELU([alpha]) This method is used for Exponential Linear Unit (ELU). Embedding(input_dim, output_dim[, dtype, …]) It turns non-negative integers into dense vectors of fixed size. Flatten(**kwargs) This method flattens the input to 2-D. GELU(**kwargs) This method is used for Gaussian Exponential Linear Unit (GELU). GlobalAvgPool1D([layout]) With the help of this method, we can do global average pooling operation for temporal data. GlobalAvgPool2D([layout]) With the help of this method, we can do global average pooling operation for spatial data. GlobalAvgPool3D([layout]) With the help of this method, we can do global average pooling operation for 3-D data. GlobalMaxPool1D([layout]) With the help of this method, we can do global max pooling operation for 1-D data. GlobalMaxPool2D([layout]) With the help of this method, we can do global max pooling operation for 2-D data. GlobalMaxPool3D([layout]) With the help of this method, we can do global max pooling operation for 3-D data. GroupNorm([num_groups, epsilon, center, …]) This method applies group normalization to the n-D input array. HybridBlock([prefix, params]) This method supports forwarding with both Symbol and NDArray. HybridLambda(function[, prefix]) With the help of this method we can wrap an operator or an expression as a HybridBlock object. HybridSequential([prefix, params]) It stacks HybridBlocks sequentially. InstanceNorm([axis, epsilon, center, scale, …]) This method applies instance normalisation to the n-D input array. Implementation Examples In the example below, we are going to use Block() which gives the base class for all neural network layers and models. from mxnet.gluon import Block, nn class Model(Block): def __init__(self, **kwargs): super(Model, self).__init__(**kwargs) # use name_scope to give child Blocks appropriate names. with self.name_scope(): self.dense0 = nn.Dense(20) self.dense1 = nn.Dense(20) def forward(self, x): x = mx.nd.relu(self.dense0(x)) return mx.nd.relu(self.dense1(x)) model = Model() model.initialize(ctx=mx.cpu(0)) model(mx.nd.zeros((5, 5), ctx=mx.cpu(0))) Output You will see the following output − [[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]] <NDArray 5×20 @cpu(0)*gt; In the example below, we are going to use HybridBlock() that supports forwarding with both Symbol and NDArray. import mxnet as mx from mxnet.gluon import HybridBlock, nn class Model(HybridBlock): def __init__(self, **kwargs): super(Model, self).__init__(**kwargs) # use name_scope to give child Blocks appropriate names. with self.name_scope(): self.dense0 = nn.Dense(20) self.dense1 = nn.Dense(20) def forward(self, x): x = nd.relu(self.dense0(x)) return nd.relu(self.dense1(x)) model = Model() model.initialize(ctx=mx.cpu(0)) model.hybridize() model(mx.nd.zeros((5, 5), ctx=mx.cpu(0))) Output The output is mentioned below − [[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]] <NDArray 5×20 @cpu(0)> gluon.rnn Gluon provides a large number of build-in recurrent neural network (RNN) layers in gluon.rnn module. That is the reason, it is called the core module. Methods and their parameters Following are some of the important methods and their parameters covered by mxnet.gluon.nn core module: Methods and its Parameters Definition BidirectionalCell(l_cell, r_cell[, …]) It is used for Bidirectional Recurrent Neural Network (RNN) cell. DropoutCell(rate[, axes, prefix, params]) This method will apply dropout on the given input. GRU(hidden_size[, num_layers, layout, …]) It applies a multi-layer gated recurrent unit (GRU) RNN to a given input sequence. GRUCell(hidden_size[, …]) It is used for Gated Rectified Unit (GRU) network cell. HybridRecurrentCell([prefix, params]) This method supports hybridize. HybridSequentialRNNCell([prefix, params]) With the
Apache MXNet – Python API Symbol In this chapter, we will learn about an interface in MXNet which is termed as Symbol. Mxnet.ndarray Apache MXNet’s Symbol API is an interface for symbolic programming. Symbol API features the use of the following − Computational graphs Reduced memory usage Pre-use function optimization The example given below shows how one can create a simple expression by using MXNet’s Symbol API − An NDArray by using 1-D and 2-D ‘array’ from a regular Python list − import mxnet as mx # Two placeholders namely x and y will be created with mx.sym.variable x = mx.sym.Variable(”x”) y = mx.sym.Variable(”y”) # The symbol here is constructed using the plus ‘+’ operator. z = x + y Output You will see the following output − <Symbol _plus0> Example (x, y, z) Output The output is given below − (<Symbol x>, <Symbol y>, <Symbol _plus0>) Now let us discuss in detail about the classes, functions, and parameters of ndarray API of MXNet. Classes Following table consists of the classes of Symbol API of MXNet − Class Definition Symbol(handle) This class namely symbol is the symbolic graph of the Apache MXNet. Functions and their parameters Following are some of the important functions and their parameters covered by mxnet.Symbol API − Function and its Parameters Definition Activation([data, act_type, out, name]) It applies an activation function element-wise to the input. It supports relu, sigmoid, tanh, softrelu, softsign activation functions. BatchNorm([data, gamma, beta, moving_mean, …]) It is used for batch normalization. This function normalizes a data batch by mean and variance. It applies a scale gamma and offset beta. BilinearSampler([data, grid, cudnn_off, …]) This function applies bilinear sampling to input feature map. Actually it is the key of “Spatial Transformer Networks”. If you are familiar with remap function in OpenCV, the usage of this function is quite similar to that. The only difference is that it has the backward pass. BlockGrad([data, out, name]) As name specifies, this function stops gradient computation. It basically stops the accumulated gradient of the inputs from flowing through this operator in backward direction. cast([data, dtype, out, name]) This function will cast all elements of the input to a new type. This function will cast all elements of the input to a new type. This function, as name specified, returns a new symbol of given shape and type, filled with zeros. ones(shape[, dtype]) This function, as name specified return a new symbol of given shape and type, filled with ones. full(shape, val[, dtype]) This function, as name specified returns a new array of given shape and type, filled with the given value val. arange(start[, stop, step, repeat, …]) It will return evenly spaced values within a given interval. The values are generated within half open interval [start, stop) which means that the interval includes start but excludes stop. linspace(start, stop, num[, endpoint, name, …]) It will return evenly spaced numbers within a specified interval. Similar to the function arrange(), the values are generated within half open interval [start, stop) which means that the interval includes start but excludes stop. histogram(a[, bins, range]) As name implies, this function will compute the histogram of the input data. power(base, exp) As name implies, this function will return element-wise result of base element raised to powers from exp element. Both inputs i.e. base and exp, can be either Symbol or scalar. Here note that broadcasting is not allowed. You can use broadcast_pow if you want to use the feature of broadcast. SoftmaxActivation([data, mode, name, attr, out]) This function applies softmax activation to input. It is intended for internal layers. It is actually deprecated, we can use softmax() instead. Implementation Examples In the example below we will be using the function power() which will return element-wise result of base element raised to the powers from exp element: import mxnet as mx mx.sym.power(3, 5) Output You will see the following output − 243 Example x = mx.sym.Variable(”x”) y = mx.sym.Variable(”y”) z = mx.sym.power(x, 3) z.eval(x=mx.nd.array([1,2]))[0].asnumpy() Output This produces the following output − array([1., 8.], dtype=float32) Example z = mx.sym.power(4, y) z.eval(y=mx.nd.array([2,3]))[0].asnumpy() Output When you execute the above code, you should see the following output − array([16., 64.], dtype=float32) Example z = mx.sym.power(x, y) z.eval(x=mx.nd.array([4,5]), y=mx.nd.array([2,3]))[0].asnumpy() Output The output is mentioned below − array([ 16., 125.], dtype=float32) In the example given below, we will be using the function SoftmaxActivation() (or softmax()) which will be applied to input and is intended for internal layers. input_data = mx.nd.array([[2., 0.9, -0.5, 4., 8.], [4., -.7, 9., 2., 0.9]]) soft_max_act = mx.nd.softmax(input_data) print (soft_max_act.asnumpy()) Output You will see the following output − [[2.4258138e-03 8.0748333e-04 1.9912292e-04 1.7924475e-02 9.7864312e-01] [6.6843745e-03 6.0796250e-05 9.9204916e-01 9.0463174e-04 3.0112563e-04]] symbol.contrib The Contrib NDArray API is defined in the symbol.contrib package. It typically provides many useful experimental APIs for new features. This API works as a place for the community where they can try out the new features. The feature contributor will get the feedback as well. Functions and their parameters Following are some of the important functions and their parameters covered by mxnet.symbol.contrib API − Function and its Parameters Definition rand_zipfian(true_classes, num_sampled, …) This function draws random samples from an approximately Zipfian distribution. The base distribution of this function is Zipfian distribution. This function randomly samples num_sampled candidates and the elements of sampled_candidates are drawn from the base distribution given above. foreach(body, data, init_states) As name implies, this function runs a loop with user-defined computation over NDArrays on dimension 0. This function simulates a for loop and body has the computation for an iteration of the for loop. while_loop(cond, func, loop_vars[, …]) As name implies, this function runs a while loop with user-defined computation and loop condition. This function simulates a while loop that literately does customized computation if the condition is satisfied. cond(pred, then_func, else_func) As name implies, this function run an if-then-else using user-defined condition and computation. This function simulates an if-like branch which chooses to do one of the two customized computations according to the specified condition. getnnz([data, axis, out, name]) This
Apache MXNet – Python API Module Apache MXNet’s module API is like a FeedForward model and it is easier to compose similar to Torch module. It consists of following classes − BaseModule([logger]) It represents the base class of a module. A module can be thought of as computation component or computation machine. The job of a module is to execute forward and backward passes. It also updates parameters in a model. Methods Following table shows the methods consisted in BaseModule class− Methods Definition backward([out_grads]) As name implies this method implements the backward computation. bind(data_shapes[, label_shapes, …]) It binds the symbols to construct executors and it is necessary before one can perform computation with the module. fit(train_data[, eval_data, eval_metric, …]) This method trains the module parameters. forward(data_batch[, is_train]) As name implies this method implements the Forward computation. This method supports data batches with various shapes like different batch sizes or different image sizes. forward_backward(data_batch) It is a convenient function, as name implies, that calls both forward and backward. get_input_grads([merge_multi_context]) This method will gets the gradients to the inputs which is computed in the previous backward computation. get_outputs([merge_multi_context]) As name implies, this method will gets outputs of the previous forward computation. get_params() It gets the parameters especially those which are potentially copies of the actual parameters used to do computation on the device. get_states([merge_multi_context]) This method will get states from all devices init_optimizer([kvstore, optimizer, …]) This method installs and initialize the optimizers. It also initializes kvstore for distribute training. init_params([initializer, arg_params, …]) As name implies, this method will initialize the parameters and auxiliary states. install_monitor(mon) This method will install monitor on all executors. iter_predict(eval_data[, num_batch, reset, …]) This method will iterate over predictions. load_params(fname) It will, as name specifies, load model parameters from file. predict(eval_data[, num_batch, …]) It will run the prediction and collects the outputs as well. prepare(data_batch[, sparse_row_id_fn]) The operator prepares the module for processing a given data batch. save_params(fname) As name specifies, this function will save the model parameters to file. score(eval_data, eval_metric[, num_batch, …]) It runs the prediction on eval_data and also evaluates the performance according to the given eval_metric. set_params(arg_params, aux_params[, …]) This method will assign the parameter and aux state values. set_states([states, value]) This method, as name implies, sets value for states. update() This method updates the given parameters according to the installed optimizer. It also updates the gradients computed in the previous forward-backward batch. update_metric(eval_metric, labels[, pre_sliced]) This method, as name implies, evaluates and accumulates the evaluation metric on outputs of the last forward computation. backward([out_grads]) As name implies this method implements the backward computation. bind(data_shapes[, label_shapes, …]) It set up the buckets and binds the executor for the default bucket key. This method represents the binding for a BucketingModule. forward(data_batch[, is_train]) As name implies this method implements the Forward computation. This method supports data batches with various shapes like different batch sizes or different image sizes. get_input_grads([merge_multi_context]) This method will get the gradients to the inputs which is computed in the previous backward computation. get_outputs([merge_multi_context]) As name implies, this method will get outputs from the previous forward computation. get_params() It gets the current parameters especially those which are potentially copies of the actual parameters used to do computation on the device. get_states([merge_multi_context]) This method will get states from all devices. init_optimizer([kvstore, optimizer, …]) This method installs and initialize the optimizers. It also initializes kvstore for distribute training. init_params([initializer, arg_params, …]) As name implies, this method will initialize the parameters and auxiliary states. install_monitor(mon) This method will install monitor on all executors. load(prefix, epoch[, sym_gen, …]) This method will create a model from the previously saved checkpoint. load_dict([sym_dict, sym_gen, …]) This method will create a model from a dictionary (dict) mapping bucket_key to symbols. It also shares arg_params and aux_params. prepare(data_batch[, sparse_row_id_fn]) The operator prepares the module for processing a given data batch. save_checkpoint(prefix, epoch[, remove_amp_cast]) This method, as name implies, saves the current progress to the checkpoint for all buckets in BucketingModule. It is recommended to use mx.callback.module_checkpoint as epoch_end_callback to save during training. set_params(arg_params, aux_params[,…]) As name specifies, this function will assign parameters and aux state values. set_states([states, value]) This method, as name implies, sets value for states. switch_bucket(bucket_key, data_shapes[, …]) It will switche to a different bucket. update() This method updates the given parameters according to the installed optimizer. It also updates the gradients computed in the previous forward-backward batch. update_metric(eval_metric, labels[, pre_sliced]) This method, as name implies, evaluates and accumulates the evaluation metric on outputs of the last forward computation. Attributes Following table shows the attributes consisted in the methods of BaseModule class − Attributes Definition data_names It consists of the list of names for data required by this module. data_shapes It consists of the list of (name, shape) pairs specifying the data inputs to this module. label_shapes It shows the list of (name, shape) pairs specifying the label inputs to this module. output_names It consists of the list of names for the outputs of this module. output_shapes It consists of the list of (name, shape) pairs specifying the outputs of this module. symbol As name specified, this attribute gets the symbol associated with this module. data_shapes: You can refer the link available at for details. output_shapes: More output_shapes: More information is available at BucketingModule(sym_gen[…]) It represents the Bucketingmodule class of a Module which helps to deal efficiently with varying length inputs. Methods Following table shows the methods consisted in BucketingModule class − Attributes Following table shows the attributes consisted in the methods of BaseModule class − Attributes Definition data_names It consists of the list of names for data required by this module. data_shapes It consists of the list of (name, shape) pairs specifying the data inputs to this module. label_shapes It shows the list of (name, shape) pairs specifying the label inputs to this module. output_names It consists of the list of names for the outputs of this module. output_shapes It consists of the list of (name, shape) pairs specifying the outputs of this module. Symbol As name specified,
Apache MXNet – Quick Guide Apache MXNet – Introduction This chapter highlights the features of Apache MXNet and talks about the latest version of this deep learning software framework. What is MXNet? Apache MXNet is a powerful open-source deep learning software framework instrument helping developers build, train, and deploy Deep Learning models. Past few years, from healthcare to transportation to manufacturing and, in fact, in every aspect of our daily life, the impact of deep learning has been widespread. Nowadays, deep learning is sought by companies to solve some hard problems like Face recognition, object detection, Optical Character Recognition (OCR), Speech Recognition, and Machine Translation. That’s the reason Apache MXNet is supported by: Some big companies like Intel, Baidu, Microsoft, Wolfram Research, etc. Public cloud providers including Amazon Web Services (AWS), and Microsoft Azure Some big research institutes like Carnegie Mellon, MIT, the University of Washington, and the Hong Kong University of Science & Technology. Why Apache MXNet? There are various deep learning platforms like Torch7, Caffe, Theano, TensorFlow, Keras, Microsoft Cognitive Toolkit, etc. existed then you might wonder why Apache MXNet? Let’s check out some of the reasons behind it: Apache MXNet solves one of the biggest issues of existing deep learning platforms. The issue is that in order to use deep learning platforms one must need to learn another system for a different programming flavor. With the help of Apache MXNet developers can exploit the full capabilities of GPUs as well as cloud computing. Apache MXNet can accelerate any numerical computation and places a special emphasis on speeding up the development and deployment of large-scale DNN (deep neural networks). It provides the users the capabilities of both imperative and symbolic programming. Various Features If you are looking for a flexible deep learning library to quickly develop cutting-edge deep learning research or a robust platform to push production workload, your search ends at Apache MXNet. It is because of the following features of it: Distributed Training Whether it is multi-gpu or multi-host training with near-linear scaling efficiency, Apache MXNet allows developers to make most out of their hardware. MXNet also support integration with Horovod, which is an open source distributed deep learning framework created at Uber. For this integration, following are some of the common distributed APIs defined in Horovod: horovod.broadcast() horovod.allgather() horovod.allgather() In this regard, MXNet offer us the following capabilities: Device Placement − With the help of MXNet we can easily specify each data structure (DS). Automatic Differentiation − Apache MXNet automates the differentiation i.e. derivative calculations. Multi-GPU training − MXNet allows us to achieve scaling efficiency with number of available GPUs. Optimized Predefined Layers − We can code our own layers in MXNet as well as the optimized the predefined layers for speed also. Hybridization Apache MXNet provides its users a hybrid front-end. With the help of the Gluon Python API it can bridge the gap between its imperative and symbolic capabilities. It can be done by calling it’s hybridize functionality. Faster Computation The linear operations like tens or hundreds of matrix multiplications are the computational bottleneck for deep neural nets. To solve this bottleneck MXNet provides − Optimized numerical computation for GPUs Optimized numerical computation for distributed ecosystems Automation of common workflows with the help of which the standard NN can be expressed briefly. Language Bindings MXNet has deep integration into high-level languages like Python and R. It also provides support for other programming languages such as- Scala Julia Clojure Java C/C++ Perl We do not need to learn any new programming language instead MXNet, combined with hybridization feature, allows an exceptionally smooth transition from Python to deployment in the programming language of our choice. Latest version MXNet 1.6.0 Apache Software Foundation (ASF) has released the stable version 1.6.0 of Apache MXNet on 21st February 2020 under Apache License 2.0. This is the last MXNet release to support Python 2 as MXNet community voted to no longer support Python 2 in further releases. Let us check out some of the new features this release brings for its users. NumPy-Compatible interface Due to its flexibility and generality, NumPy has been widely used by Machine Learning practitioners, scientists, and students. But as we know that, these days’ hardware accelerators like Graphical Processing Units (GPUs) have become increasingly assimilated into various Machine Learning (ML) toolkits, the NumPy users, to take advantage of the speed of GPUs, need to switch to new frameworks with different syntax. With MXNet 1.6.0, Apache MXNet is moving toward a NumPy-compatible programming experience. The new interface provides equivalent usability as well as expressiveness to the practitioners familiar with NumPy syntax. Along with that MXNet 1.6.0 also enables the existing Numpy system to utilize hardware accelerators like GPUs to speed-up large-scale computations. Integration with Apache TVM Apache TVM, an open-source end-to-end deep learning compiler stack for hardware-backends such as CPUs, GPUs, and specialized accelerators, aims to fill the gap between the productivity-focused deep-learning frameworks and performance-oriented hardware backends. With the latest release MXNet 1.6.0, users can leverage Apache(incubating) TVM to implement high-performance operator kernels in Python programming language. Two main advantages of this new feature are following − Simplifies the former C++ based development process. Enables sharing the same implementation across multiple hardware backend such as CPUs, GPUs, etc. Improvements on existing features Apart from the above listed features of MXNet 1.6.0, it also provides some improvements over the existing features. The improvements are as follows − Grouping element-wise operation for GPU As we know the performance of element-wise operations is memory-bandwidth and that is the reason, chaining such operations may reduce overall performance. Apache MXNet 1.6.0 does element-wise operation fusion, that actually generates just-in-time fused operations as and when possible. Such element-wise operation fusion also reduces storage needs and improve overall performance. Simplifying common expressions MXNet 1.6.0 eliminates the redundant expressions and simplify the common expressions. Such enhancement also improves memory usage and total execution time. Optimizations MXNet 1.6.0 also provides various optimizations to existing features & operators, which are as follows: Automatic
Discuss Apache MXNet Apache MXNet is a powerful open-source deep learning software framework instrument helping developers build, train, and deploy Deep Learning models. Past few years, from healthcare to transportation to manufacturing and, in fact, in every aspect of our daily life, the impact of deep learning has been widespread. Nowadays, deep learning is sought by companies to solve some hard problems like Face recognition, object detection, Optical Character Recognition (OCR), Speech Recognition, and Machine Translation.
Apache MXNet – Useful Resources The following resources contain additional information on Apache MXNet. Please use them to get more in-depth knowledge on this. Useful Links on Apache MXNet − Wikipedia Reference for Apache MXNet. − Reference for Apache MXNet. Useful Books on Apache MXNet To enlist your site on this page, please drop an email to [email protected]
Apache MXNet Tutorial Job Search Apache MXNet is a powerful open-source deep learning software framework instrument helping developers build, train, and deploy Deep Learning models. Past few years, from healthcare to transportation to manufacturing and, in fact, in every aspect of our daily life, the impact of deep learning has been widespread. Nowadays, deep learning is sought by companies to solve some hard problems like Face recognition, object detection, Optical Character Recognition (OCR), Speech Recognition, and Machine Translation. Audience This tutorial will be useful for graduates, post-graduates, and research students who either have an interest in the field of AI, Machine Learning and Deep Learning or have it as a part of their curriculum. The reader can be a beginner or an advanced learner. Prerequisites The reader must have basic knowledge about Artificial Intelligence. He/she should also be aware about Python language and its functions. If you are new to any of these concepts, we recommend you take up tutorials concerning these topics before you dig further into this tutorial.
Apache MXNet – Unified Operator API This chapter provides information about the unified operator application programming interface (API) in Apache MXNet. SimpleOp SimpleOp is a new unified operator API which unifies different invoking processes. Once invoked, it returns to the fundamental elements of operators. The unified operator is specially designed for unary as well as binary operations. It is because most of the mathematical operators attend to one or two operands and more operands make the optimization, related to dependency, useful. We will be understanding its SimpleOp unified operator working with the help of an example. In this example, we will be creating an operator functioning as a smooth l1 loss, which is a mixture of l1 and l2 loss. We can define and write the loss as given below − loss = outside_weight .* f(inside_weight .* (data – label)) grad = outside_weight .* inside_weight .* f”(inside_weight .* (data – label)) Here, in above example, .* stands for element-wise multiplication f, f’ is the smooth l1 loss function which we are assuming is in mshadow. It looks impossible to implement this particular loss as a unary or binary operator but MXNet provides its users automatic differentiation in symbolic execution which simplifies the loss to f and f’ directly. That’s why we can certainly implement this particular loss as a unary operator. Defining Shapes As we know MXNet’s mshadow library requires explicit memory allocation hence we need to provide all data shapes before any calculation occurs. Before defining functions and gradient, we need to provide input shape consistency and output shape as follows: typedef mxnet::TShape (*UnaryShapeFunction)(const mxnet::TShape& src, const EnvArguments& env); typedef mxnet::TShape (*BinaryShapeFunction)(const mxnet::TShape& lhs, const mxnet::TShape& rhs, const EnvArguments& env); The function mxnet::Tshape is used to check input data shape and designated output data shape. In case, if you do not define this function then the default output shape would be same as input shape. For example, in case of binary operator the shape of lhs and rhs is by default checked as the same. Now let’s move on to our smooth l1 loss example. For this, we need to define an XPU to cpu or gpu in the header implementation smooth_l1_unary-inl.h. The reason is to reuse the same code in smooth_l1_unary.cc and smooth_l1_unary.cu. #include <mxnet/operator_util.h> #if defined(__CUDACC__) #define XPU gpu #else #define XPU cpu #endif As in our smooth l1 loss example, the output has the same shape as the source, we can use the default behavior. It can be written as follows − inline mxnet::TShape SmoothL1Shape_(const mxnet::TShape& src,const EnvArguments& env) { return mxnet::TShape(src); } Defining Functions We can create a unary or binary function with one input as follows − typedef void (*UnaryFunction)(const TBlob& src, const EnvArguments& env, TBlob* ret, OpReqType req, RunContext ctx); typedef void (*BinaryFunction)(const TBlob& lhs, const TBlob& rhs, const EnvArguments& env, TBlob* ret, OpReqType req, RunContext ctx); Following is the RunContext ctx struct which contains the information needed during runtime for execution − struct RunContext { void *stream; // the stream of the device, can be NULL or Stream<gpu>* in GPU mode template<typename xpu> inline mshadow::Stream<xpu>* get_stream() // get mshadow stream from Context } // namespace mxnet Now, let’s see how we can write the computation results in ret. enum OpReqType { kNullOp, // no operation, do not write anything kWriteTo, // write gradient to provided space kWriteInplace, // perform an in-place write kAddTo // add to the provided space }; Now, let’s move on to our smooth l1 loss example. For this, we will use UnaryFunction to define the function of this operator as follows: template<typename xpu> void SmoothL1Forward_(const TBlob& src, const EnvArguments& env, TBlob *ret, OpReqType req, RunContext ctx) { using namespace mshadow; using namespace mshadow::expr; mshadow::Stream<xpu> *s = ctx.get_stream<xpu>(); real_t sigma2 = env.scalar * env.scalar; MSHADOW_TYPE_SWITCH(ret->type_flag_, DType, { mshadow::Tensor<xpu, 2, DType> out = ret->get<xpu, 2, DType>(s); mshadow::Tensor<xpu, 2, DType> in = src.get<xpu, 2, DType>(s); ASSIGN_DISPATCH(out, req, F<mshadow_op::smooth_l1_loss>(in, ScalarExp<DType>(sigma2))); }); } Defining Gradients Except Input, TBlob, and OpReqType are doubled, Gradients functions of binary operators have similar structure. Let’s check out below, where we created a gradient function with various types of input: // depending only on out_grad typedef void (*UnaryGradFunctionT0)(const OutputGrad& out_grad, const EnvArguments& env, TBlob* in_grad, OpReqType req, RunContext ctx); // depending only on out_value typedef void (*UnaryGradFunctionT1)(const OutputGrad& out_grad, const OutputValue& out_value, const EnvArguments& env, TBlob* in_grad, OpReqType req, RunContext ctx); // depending only on in_data typedef void (*UnaryGradFunctionT2)(const OutputGrad& out_grad, const Input0& in_data0, const EnvArguments& env, TBlob* in_grad, OpReqType req, RunContext ctx); As defined above Input0, Input, OutputValue, and OutputGrad all share the structure of GradientFunctionArgument. It is defined as follows − struct GradFunctionArgument { TBlob data; } Now let’s move on to our smooth l1 loss example. For this to enable the chain rule of gradient we need to multiply out_grad from the top to the result of in_grad. template<typename xpu> void SmoothL1BackwardUseIn_(const OutputGrad& out_grad, const Input0& in_data0, const EnvArguments& env, TBlob *in_grad, OpReqType req, RunContext ctx) { using namespace mshadow; using namespace mshadow::expr; mshadow::Stream<xpu> *s = ctx.get_stream<xpu>(); real_t sigma2 = env.scalar * env.scalar; MSHADOW_TYPE_SWITCH(in_grad->type_flag_, DType, { mshadow::Tensor<xpu, 2, DType> src = in_data0.data.get<xpu, 2, DType>(s); mshadow::Tensor<xpu, 2, DType> ograd = out_grad.data.get<xpu, 2, DType>(s); mshadow::Tensor<xpu, 2, DType> igrad = in_grad->get<xpu, 2, DType>(s); ASSIGN_DISPATCH(igrad, req, ograd * F<mshadow_op::smooth_l1_gradient>(src, ScalarExp<DType>(sigma2))); }); } Register SimpleOp to MXNet Once we created the shape, function, and gradient, we need to restore them into both an NDArray operator as well as into a symbolic operator. For this, we can use the registration macro as follows − MXNET_REGISTER_SIMPLE_OP(Name, DEV) .set_shape_function(Shape) .set_function(DEV::kDevMask, Function<XPU>, SimpleOpInplaceOption) .set_gradient(DEV::kDevMask, Gradient<XPU>, SimpleOpInplaceOption) .describe(“description”); The SimpleOpInplaceOption can be defined as follows − enum SimpleOpInplaceOption { kNoInplace, // do not allow inplace in arguments kInplaceInOut, // allow inplace in with out (unary) kInplaceOutIn, // allow inplace out_grad with in_grad (unary) kInplaceLhsOut, // allow inplace left operand with out (binary) kInplaceOutLhs // allow inplace out_grad with lhs_grad (binary) }; Now let’s move on to our smooth l1 loss example. For this, we have a gradient function that relies on input data so
Apache MXNet – Installing MXNet To get started with MXNet, the first thing we need to do, is to install it on our computer. Apache MXNet works on pretty much all the platforms available, including Windows, Mac, and Linux. Linux OS We can install MXNet on Linux OS in the following ways − Graphical Processing Unit (GPU) Here, we will use various methods namely Pip, Docker, and Source to install MXNet when we are using GPU for processing − By using Pip method You can use the following command to install MXNet on your Linus OS − pip install mxnet Apache MXNet also offers MKL pip packages, which are much faster when running on intel hardware. Here for example mxnet-cu101mkl means that − The package is built with CUDA/cuDNN The package is MKL-DNN enabled The CUDA version is 10.1 For other option you can also refer to . By using Docker You can find the docker images with MXNet at DockerHub, which is available at Let us check out the steps below to install MXNet by using Docker with GPU − Step 1− First, by following the docker installation instructions which are available at . We need to install Docker on our machine. Step 2− To enable the usage of GPUs from the docker containers, next we need to install nvidia-docker-plugin. You can follow the installation instructions given at . Step 3− By using the following command, you can pull the MXNet docker image − $ sudo docker pull mxnet/python:gpu Now in order to see if mxnet/python docker image pull was successful, we can list docker images as follows − $ sudo docker images For the fastest inference speeds with MXNet, it is recommended to use the latest MXNet with Intel MKL-DNN. Check the commands below − $ sudo docker pull mxnet/python:1.3.0_cpu_mkl $ sudo docker images From source To build the MXNet shared library from source with GPU, first we need to set up the environment for CUDA and cuDNN as follows− Download and install CUDA toolkit, here CUDA 9.2 is recommended. Next download cuDNN 7.1.4. Now we need to unzip the file. It is also required to change to the cuDNN root directory. Also move the header and libraries to local CUDA Toolkit folder as follows − tar xvzf cudnn-9.2-linux-x64-v7.1 sudo cp -P cuda/include/cudnn.h /usr/local/cuda/include sudo cp -P cuda/lib64/libcudnn* /usr/local/cuda/lib64 sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn* sudo ldconfig After setting up the environment for CUDA and cuDNN, follow the steps below to build the MXNet shared library from source − Step 1− First, we need to install the prerequisite packages. These dependencies are required on Ubuntu version 16.04 or later. sudo apt-get update sudo apt-get install -y build-essential git ninja-build ccache libopenblas-dev libopencv-dev cmake Step 2− In this step, we will download MXNet source and configure. First let us clone the repository by using following command− git clone –recursive https://github.com/apache/incubator-mxnet.git mxnet cd mxnet cp config/linux_gpu.cmake #for build with CUDA Step 3− By using the following commands, you can build MXNet core shared library− rm -rf build mkdir -p build && cd build cmake -GNinja .. cmake –build . Two important points regarding the above step is as follows− If you want to build the Debug version, then specify the as follows− cmake -DCMAKE_BUILD_TYPE=Debug -GNinja .. In order to set the number of parallel compilation jobs, specify the following − cmake –build . –parallel N Once you successfully build MXNet core shared library, in the build folder in your MXNet project root, you will find libmxnet.so which is required to install language bindings(optional). Central Processing Unit (CPU) Here, we will use various methods namely Pip, Docker, and Source to install MXNet when we are using CPU for processing − By using Pip method You can use the following command to install MXNet on your Linus OS− pip install mxnet Apache MXNet also offers MKL-DNN enabled pip packages which are much faster, when running on intel hardware. pip install mxnet-mkl By using Docker You can find the docker images with MXNet at DockerHub, which is available at . Let us check out the steps below to install MXNet by using Docker with CPU − Step 1− First, by following the docker installation instructions which are available at . We need to install Docker on our machine. Step 2− By using the following command, you can pull the MXNet docker image: $ sudo docker pull mxnet/python Now, in order to see if mxnet/python docker image pull was successful, we can list docker images as follows − $ sudo docker images For the fastest inference speeds with MXNet, it is recommended to use the latest MXNet with Intel MKL-DNN. Check the commands below − $ sudo docker pull mxnet/python:1.3.0_cpu_mkl $ sudo docker images From source To build the MXNet shared library from source with CPU, follow the steps below − Step 1− First, we need to install the prerequisite packages. These dependencies are required on Ubuntu version 16.04 or later. sudo apt-get update sudo apt-get install -y build-essential git ninja-build ccache libopenblas-dev libopencv-dev cmake Step 2− In this step we will download MXNet source and configure. First let us clone the repository by using following command: git clone –recursive https://github.com/apache/incubator-mxnet.git mxnet cd mxnet cp config/linux.cmake config.cmake Step 3− By using the following commands, you can build MXNet core shared library: rm -rf build mkdir -p build && cd build cmake -GNinja .. cmake –build . Two important points regarding the above step is as follows− If you want to build the Debug version, then specify the as follows: cmake -DCMAKE_BUILD_TYPE=Debug -GNinja .. In order to set the number of parallel compilation jobs, specify the following− cmake –build . –parallel N Once you successfully build MXNet core shared library, in the build folder in your MXNet project root, you will find libmxnet.so, which is required to install language bindings(optional). MacOS We can install MXNet on MacOS in the following ways− Graphical Processing Unit (GPU) If you plan to build MXNet on MacOS with GPU,