Discuss Apache MXNet Apache MXNet is a powerful open-source deep learning software framework instrument helping developers build, train, and deploy Deep Learning models. Past few years, from healthcare to transportation to manufacturing and, in fact, in every aspect of our daily life, the impact of deep learning has been widespread. Nowadays, deep learning is sought by companies to solve some hard problems like Face recognition, object detection, Optical Character Recognition (OCR), Speech Recognition, and Machine Translation.
Category: Apache Mxnet
Apache MXNet – Useful Resources The following resources contain additional information on Apache MXNet. Please use them to get more in-depth knowledge on this. Useful Links on Apache MXNet − Wikipedia Reference for Apache MXNet. − Reference for Apache MXNet. Useful Books on Apache MXNet To enlist your site on this page, please drop an email to [email protected]
Apache MXNet – KVStore and Visualization This chapter deals with the python packages KVStore and visualization. KVStore package KV stores stands for Key-Value store. It is critical component used for multi-device training. It is important because, the communication of parameters across devices on single as well as across multiple machines is transmitted through one or more servers with a KVStore for the parameters. Let us understand the working of KVStore with the help of following points: Each value in KVStore is represented by a key and a value. Each parameter array in the network is assigned a key and the weights of that parameter array is referred by value. After that, the worker nodes push gradients after processing a batch. They also pull updated weights before processing a new batch. In simple words, we can say that KVStore is a place for data sharing where, each device can push data in and pull data out. Data Push-In and Pull-Out KVStore can be thought of as single object shared across different devices such as GPUs & computers, where each device is able to push data in and pull data out. Following are the implementation steps that needs to be followed by devices to push data in and pull data out: Implementation steps Initialisation − First step is to initialise the values. Here for our example, we will be initialising a pair (int, NDArray) pair into KVStrore and after that pulling the values out − import mxnet as mx kv = mx.kv.create(”local”) # create a local KVStore. shape = (3,3) kv.init(3, mx.nd.ones(shape)*2) a = mx.nd.zeros(shape) kv.pull(3, out = a) print(a.asnumpy()) Output This produces the following output − [[2. 2. 2.] [2. 2. 2.] [2. 2. 2.]] Push, Aggregate, and Update − Once initialised, we can push a new value into KVStore with the same shape to the key − kv.push(3, mx.nd.ones(shape)*8) kv.pull(3, out = a) print(a.asnumpy()) Output The output is given below − [[8. 8. 8.] [8. 8. 8.] [8. 8. 8.]] The data used for pushing can be stored on any device such as GPUs or computers. We can also push multiple values into the same key. In this case, the KVStore will first sum all of these values and then push the aggregated value as follows − contexts = [mx.cpu(i) for i in range(4)] b = [mx.nd.ones(shape, ctx) for ctx in contexts] kv.push(3, b) kv.pull(3, out = a) print(a.asnumpy()) Output You will see the following output − [[4. 4. 4.] [4. 4. 4.] [4. 4. 4.]] For each push you applied, KVStore will combine the pushed value with the value already stored. It will be done with the help of an updater. Here, the default updater is ASSIGN. def update(key, input, stored): print(“update on key: %d” % key) stored += input * 2 kv.set_updater(update) kv.pull(3, out=a) print(a.asnumpy()) Output When you execute the above code, you should see the following output − [[4. 4. 4.] [4. 4. 4.] [4. 4. 4.]] Example kv.push(3, mx.nd.ones(shape)) kv.pull(3, out=a) print(a.asnumpy()) Output Given below is the output of the code − update on key: 3 [[6. 6. 6.] [6. 6. 6.] [6. 6. 6.]] Pull − As like Push, we can also pull the value onto several devices with a single call as follows − b = [mx.nd.ones(shape, ctx) for ctx in contexts] kv.pull(3, out = b) print(b[1].asnumpy()) Output The output is stated below − [[6. 6. 6.] [6. 6. 6.] [6. 6. 6.]] Complete Implementation Example Given below is the complete implementation example − import mxnet as mx kv = mx.kv.create(”local”) shape = (3,3) kv.init(3, mx.nd.ones(shape)*2) a = mx.nd.zeros(shape) kv.pull(3, out = a) print(a.asnumpy()) kv.push(3, mx.nd.ones(shape)*8) kv.pull(3, out = a) # pull out the value print(a.asnumpy()) contexts = [mx.cpu(i) for i in range(4)] b = [mx.nd.ones(shape, ctx) for ctx in contexts] kv.push(3, b) kv.pull(3, out = a) print(a.asnumpy()) def update(key, input, stored): print(“update on key: %d” % key) stored += input * 2 kv._set_updater(update) kv.pull(3, out=a) print(a.asnumpy()) kv.push(3, mx.nd.ones(shape)) kv.pull(3, out=a) print(a.asnumpy()) b = [mx.nd.ones(shape, ctx) for ctx in contexts] kv.pull(3, out = b) print(b[1].asnumpy()) Handling Key-Value Pairs All the operations we have implemented above involves a single key, but KVStore also provides an interface for a list of key-value pairs − For a single device Following is an example to show an KVStore interface for a list of key-value pairs for a single device − keys = [5, 7, 9] kv.init(keys, [mx.nd.ones(shape)]*len(keys)) kv.push(keys, [mx.nd.ones(shape)]*len(keys)) b = [mx.nd.zeros(shape)]*len(keys) kv.pull(keys, out = b) print(b[1].asnumpy()) Output You will receive the following output − update on key: 5 update on key: 7 update on key: 9 [[3. 3. 3.] [3. 3. 3.] [3. 3. 3.]] For multiple device Following is an example to show an KVStore interface for a list of key-value pairs for multiple device − b = [[mx.nd.ones(shape, ctx) for ctx in contexts]] * len(keys) kv.push(keys, b) kv.pull(keys, out = b) print(b[1][1].asnumpy()) Output You will see the following output − update on key: 5 update on key: 7 update on key: 9 [[11. 11. 11.] [11. 11. 11.] [11. 11. 11.]] Visualization package Visualization package is Apache MXNet package used to represents the neural network (NN) as a computation graph that consists of nodes and edges. Visualize neural network In the example below we will use mx.viz.plot_network to visualize neural network. Followings are the prerequisites for this − Prerequisites Jupyter notebook Graphviz library Implementation Example In the example below we will visualize a sample NN for linear matrix factorisation − import mxnet as mx user = mx.symbol.Variable(”user”) item = mx.symbol.Variable(”item”) score = mx.symbol.Variable(”score”) # Set the dummy dimensions k = 64 max_user = 100 max_item = 50 # The user feature lookup user = mx.symbol.Embedding(data = user, input_dim = max_user, output_dim = k) # The item feature lookup item = mx.symbol.Embedding(data = item, input_dim = max_item, output_dim = k) # predict by the inner product and then do sum N_net = user * item N_net = mx.symbol.sum_axis(data = N_net, axis = 1) N_net = mx.symbol.Flatten(data = N_net) # Defining the loss layer N_net =
Apache MXNet – Python API gluon As we have already discussed in previous chapters that, MXNet Gluon provides a clear, concise, and simple API for DL projects. It enables Apache MXNet to prototype, build, and train DL models without forfeiting the training speed. Core Modules Let us learn the core modules of Apache MXNet Python application programming interface (API) gluon. gluon.nn Gluon provides a large number of build-in NN layers in gluon.nn module. That is the reason it is called the core module. Methods and their parameters Following are some of the important methods and their parameters covered by mxnet.gluon.nn core module − Methods and its Parameters Definition Activation(activation, **kwargs) As name implies, this method applies an activation function to input. AvgPool1D([pool_size, strides, padding, …]) This is average pooling operation for temporal data. AvgPool2D([pool_size, strides, padding, …]) This is average pooling operation for spatial data. AvgPool3D([pool_size, strides, padding, …]) This is Average pooling operation for 3D data. The data can be spatial or spatio-temporal. BatchNorm([axis, momentum, epsilon, center, …]) It represents batch normalisation layer. BatchNormReLU([axis, momentum, epsilon, …]) It also represents batch normalisation layer but with Relu activation function. Block([prefix, params]) It gives the base class for all neural network layers and models. Conv1D(channels, kernel_size[, strides, …]) This method is used for 1-D convolution layer. For example, temporal convolution. Conv1DTranspose(channels, kernel_size[, …]) This method is used for Transposed 1D convolution layer. Conv2D(channels, kernel_size[, strides, …]) This method is used for 2D convolution layer. For example, spatial convolution over images). Conv2DTranspose(channels, kernel_size[, …]) This method is used for Transposed 2D convolution layer. Conv3D(channels, kernel_size[, strides, …]) This method is used for 3D convolution layer. For example, spatial convolution over volumes. Conv3DTranspose(channels, kernel_size[, …]) This method is used for Transposed 3D convolution layer. Dense(units[, activation, use_bias, …]) This method represents for your regular densely-connected NN layer. Dropout(rate[, axes]) As name implies, the method applies Dropout to the input. ELU([alpha]) This method is used for Exponential Linear Unit (ELU). Embedding(input_dim, output_dim[, dtype, …]) It turns non-negative integers into dense vectors of fixed size. Flatten(**kwargs) This method flattens the input to 2-D. GELU(**kwargs) This method is used for Gaussian Exponential Linear Unit (GELU). GlobalAvgPool1D([layout]) With the help of this method, we can do global average pooling operation for temporal data. GlobalAvgPool2D([layout]) With the help of this method, we can do global average pooling operation for spatial data. GlobalAvgPool3D([layout]) With the help of this method, we can do global average pooling operation for 3-D data. GlobalMaxPool1D([layout]) With the help of this method, we can do global max pooling operation for 1-D data. GlobalMaxPool2D([layout]) With the help of this method, we can do global max pooling operation for 2-D data. GlobalMaxPool3D([layout]) With the help of this method, we can do global max pooling operation for 3-D data. GroupNorm([num_groups, epsilon, center, …]) This method applies group normalization to the n-D input array. HybridBlock([prefix, params]) This method supports forwarding with both Symbol and NDArray. HybridLambda(function[, prefix]) With the help of this method we can wrap an operator or an expression as a HybridBlock object. HybridSequential([prefix, params]) It stacks HybridBlocks sequentially. InstanceNorm([axis, epsilon, center, scale, …]) This method applies instance normalisation to the n-D input array. Implementation Examples In the example below, we are going to use Block() which gives the base class for all neural network layers and models. from mxnet.gluon import Block, nn class Model(Block): def __init__(self, **kwargs): super(Model, self).__init__(**kwargs) # use name_scope to give child Blocks appropriate names. with self.name_scope(): self.dense0 = nn.Dense(20) self.dense1 = nn.Dense(20) def forward(self, x): x = mx.nd.relu(self.dense0(x)) return mx.nd.relu(self.dense1(x)) model = Model() model.initialize(ctx=mx.cpu(0)) model(mx.nd.zeros((5, 5), ctx=mx.cpu(0))) Output You will see the following output − [[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]] <NDArray 5×20 @cpu(0)*gt; In the example below, we are going to use HybridBlock() that supports forwarding with both Symbol and NDArray. import mxnet as mx from mxnet.gluon import HybridBlock, nn class Model(HybridBlock): def __init__(self, **kwargs): super(Model, self).__init__(**kwargs) # use name_scope to give child Blocks appropriate names. with self.name_scope(): self.dense0 = nn.Dense(20) self.dense1 = nn.Dense(20) def forward(self, x): x = nd.relu(self.dense0(x)) return nd.relu(self.dense1(x)) model = Model() model.initialize(ctx=mx.cpu(0)) model.hybridize() model(mx.nd.zeros((5, 5), ctx=mx.cpu(0))) Output The output is mentioned below − [[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]] <NDArray 5×20 @cpu(0)> gluon.rnn Gluon provides a large number of build-in recurrent neural network (RNN) layers in gluon.rnn module. That is the reason, it is called the core module. Methods and their parameters Following are some of the important methods and their parameters covered by mxnet.gluon.nn core module: Methods and its Parameters Definition BidirectionalCell(l_cell, r_cell[, …]) It is used for Bidirectional Recurrent Neural Network (RNN) cell. DropoutCell(rate[, axes, prefix, params]) This method will apply dropout on the given input. GRU(hidden_size[, num_layers, layout, …]) It applies a multi-layer gated recurrent unit (GRU) RNN to a given input sequence. GRUCell(hidden_size[, …]) It is used for Gated Rectified Unit (GRU) network cell. HybridRecurrentCell([prefix, params]) This method supports hybridize. HybridSequentialRNNCell([prefix, params]) With the
Apache MXNet – Python API Symbol In this chapter, we will learn about an interface in MXNet which is termed as Symbol. Mxnet.ndarray Apache MXNet’s Symbol API is an interface for symbolic programming. Symbol API features the use of the following − Computational graphs Reduced memory usage Pre-use function optimization The example given below shows how one can create a simple expression by using MXNet’s Symbol API − An NDArray by using 1-D and 2-D ‘array’ from a regular Python list − import mxnet as mx # Two placeholders namely x and y will be created with mx.sym.variable x = mx.sym.Variable(”x”) y = mx.sym.Variable(”y”) # The symbol here is constructed using the plus ‘+’ operator. z = x + y Output You will see the following output − <Symbol _plus0> Example (x, y, z) Output The output is given below − (<Symbol x>, <Symbol y>, <Symbol _plus0>) Now let us discuss in detail about the classes, functions, and parameters of ndarray API of MXNet. Classes Following table consists of the classes of Symbol API of MXNet − Class Definition Symbol(handle) This class namely symbol is the symbolic graph of the Apache MXNet. Functions and their parameters Following are some of the important functions and their parameters covered by mxnet.Symbol API − Function and its Parameters Definition Activation([data, act_type, out, name]) It applies an activation function element-wise to the input. It supports relu, sigmoid, tanh, softrelu, softsign activation functions. BatchNorm([data, gamma, beta, moving_mean, …]) It is used for batch normalization. This function normalizes a data batch by mean and variance. It applies a scale gamma and offset beta. BilinearSampler([data, grid, cudnn_off, …]) This function applies bilinear sampling to input feature map. Actually it is the key of “Spatial Transformer Networks”. If you are familiar with remap function in OpenCV, the usage of this function is quite similar to that. The only difference is that it has the backward pass. BlockGrad([data, out, name]) As name specifies, this function stops gradient computation. It basically stops the accumulated gradient of the inputs from flowing through this operator in backward direction. cast([data, dtype, out, name]) This function will cast all elements of the input to a new type. This function will cast all elements of the input to a new type. This function, as name specified, returns a new symbol of given shape and type, filled with zeros. ones(shape[, dtype]) This function, as name specified return a new symbol of given shape and type, filled with ones. full(shape, val[, dtype]) This function, as name specified returns a new array of given shape and type, filled with the given value val. arange(start[, stop, step, repeat, …]) It will return evenly spaced values within a given interval. The values are generated within half open interval [start, stop) which means that the interval includes start but excludes stop. linspace(start, stop, num[, endpoint, name, …]) It will return evenly spaced numbers within a specified interval. Similar to the function arrange(), the values are generated within half open interval [start, stop) which means that the interval includes start but excludes stop. histogram(a[, bins, range]) As name implies, this function will compute the histogram of the input data. power(base, exp) As name implies, this function will return element-wise result of base element raised to powers from exp element. Both inputs i.e. base and exp, can be either Symbol or scalar. Here note that broadcasting is not allowed. You can use broadcast_pow if you want to use the feature of broadcast. SoftmaxActivation([data, mode, name, attr, out]) This function applies softmax activation to input. It is intended for internal layers. It is actually deprecated, we can use softmax() instead. Implementation Examples In the example below we will be using the function power() which will return element-wise result of base element raised to the powers from exp element: import mxnet as mx mx.sym.power(3, 5) Output You will see the following output − 243 Example x = mx.sym.Variable(”x”) y = mx.sym.Variable(”y”) z = mx.sym.power(x, 3) z.eval(x=mx.nd.array([1,2]))[0].asnumpy() Output This produces the following output − array([1., 8.], dtype=float32) Example z = mx.sym.power(4, y) z.eval(y=mx.nd.array([2,3]))[0].asnumpy() Output When you execute the above code, you should see the following output − array([16., 64.], dtype=float32) Example z = mx.sym.power(x, y) z.eval(x=mx.nd.array([4,5]), y=mx.nd.array([2,3]))[0].asnumpy() Output The output is mentioned below − array([ 16., 125.], dtype=float32) In the example given below, we will be using the function SoftmaxActivation() (or softmax()) which will be applied to input and is intended for internal layers. input_data = mx.nd.array([[2., 0.9, -0.5, 4., 8.], [4., -.7, 9., 2., 0.9]]) soft_max_act = mx.nd.softmax(input_data) print (soft_max_act.asnumpy()) Output You will see the following output − [[2.4258138e-03 8.0748333e-04 1.9912292e-04 1.7924475e-02 9.7864312e-01] [6.6843745e-03 6.0796250e-05 9.9204916e-01 9.0463174e-04 3.0112563e-04]] symbol.contrib The Contrib NDArray API is defined in the symbol.contrib package. It typically provides many useful experimental APIs for new features. This API works as a place for the community where they can try out the new features. The feature contributor will get the feedback as well. Functions and their parameters Following are some of the important functions and their parameters covered by mxnet.symbol.contrib API − Function and its Parameters Definition rand_zipfian(true_classes, num_sampled, …) This function draws random samples from an approximately Zipfian distribution. The base distribution of this function is Zipfian distribution. This function randomly samples num_sampled candidates and the elements of sampled_candidates are drawn from the base distribution given above. foreach(body, data, init_states) As name implies, this function runs a loop with user-defined computation over NDArrays on dimension 0. This function simulates a for loop and body has the computation for an iteration of the for loop. while_loop(cond, func, loop_vars[, …]) As name implies, this function runs a while loop with user-defined computation and loop condition. This function simulates a while loop that literately does customized computation if the condition is satisfied. cond(pred, then_func, else_func) As name implies, this function run an if-then-else using user-defined condition and computation. This function simulates an if-like branch which chooses to do one of the two customized computations according to the specified condition. getnnz([data, axis, out, name]) This
Apache MXNet – Python API Module Apache MXNet’s module API is like a FeedForward model and it is easier to compose similar to Torch module. It consists of following classes − BaseModule([logger]) It represents the base class of a module. A module can be thought of as computation component or computation machine. The job of a module is to execute forward and backward passes. It also updates parameters in a model. Methods Following table shows the methods consisted in BaseModule class− Methods Definition backward([out_grads]) As name implies this method implements the backward computation. bind(data_shapes[, label_shapes, …]) It binds the symbols to construct executors and it is necessary before one can perform computation with the module. fit(train_data[, eval_data, eval_metric, …]) This method trains the module parameters. forward(data_batch[, is_train]) As name implies this method implements the Forward computation. This method supports data batches with various shapes like different batch sizes or different image sizes. forward_backward(data_batch) It is a convenient function, as name implies, that calls both forward and backward. get_input_grads([merge_multi_context]) This method will gets the gradients to the inputs which is computed in the previous backward computation. get_outputs([merge_multi_context]) As name implies, this method will gets outputs of the previous forward computation. get_params() It gets the parameters especially those which are potentially copies of the actual parameters used to do computation on the device. get_states([merge_multi_context]) This method will get states from all devices init_optimizer([kvstore, optimizer, …]) This method installs and initialize the optimizers. It also initializes kvstore for distribute training. init_params([initializer, arg_params, …]) As name implies, this method will initialize the parameters and auxiliary states. install_monitor(mon) This method will install monitor on all executors. iter_predict(eval_data[, num_batch, reset, …]) This method will iterate over predictions. load_params(fname) It will, as name specifies, load model parameters from file. predict(eval_data[, num_batch, …]) It will run the prediction and collects the outputs as well. prepare(data_batch[, sparse_row_id_fn]) The operator prepares the module for processing a given data batch. save_params(fname) As name specifies, this function will save the model parameters to file. score(eval_data, eval_metric[, num_batch, …]) It runs the prediction on eval_data and also evaluates the performance according to the given eval_metric. set_params(arg_params, aux_params[, …]) This method will assign the parameter and aux state values. set_states([states, value]) This method, as name implies, sets value for states. update() This method updates the given parameters according to the installed optimizer. It also updates the gradients computed in the previous forward-backward batch. update_metric(eval_metric, labels[, pre_sliced]) This method, as name implies, evaluates and accumulates the evaluation metric on outputs of the last forward computation. backward([out_grads]) As name implies this method implements the backward computation. bind(data_shapes[, label_shapes, …]) It set up the buckets and binds the executor for the default bucket key. This method represents the binding for a BucketingModule. forward(data_batch[, is_train]) As name implies this method implements the Forward computation. This method supports data batches with various shapes like different batch sizes or different image sizes. get_input_grads([merge_multi_context]) This method will get the gradients to the inputs which is computed in the previous backward computation. get_outputs([merge_multi_context]) As name implies, this method will get outputs from the previous forward computation. get_params() It gets the current parameters especially those which are potentially copies of the actual parameters used to do computation on the device. get_states([merge_multi_context]) This method will get states from all devices. init_optimizer([kvstore, optimizer, …]) This method installs and initialize the optimizers. It also initializes kvstore for distribute training. init_params([initializer, arg_params, …]) As name implies, this method will initialize the parameters and auxiliary states. install_monitor(mon) This method will install monitor on all executors. load(prefix, epoch[, sym_gen, …]) This method will create a model from the previously saved checkpoint. load_dict([sym_dict, sym_gen, …]) This method will create a model from a dictionary (dict) mapping bucket_key to symbols. It also shares arg_params and aux_params. prepare(data_batch[, sparse_row_id_fn]) The operator prepares the module for processing a given data batch. save_checkpoint(prefix, epoch[, remove_amp_cast]) This method, as name implies, saves the current progress to the checkpoint for all buckets in BucketingModule. It is recommended to use mx.callback.module_checkpoint as epoch_end_callback to save during training. set_params(arg_params, aux_params[,…]) As name specifies, this function will assign parameters and aux state values. set_states([states, value]) This method, as name implies, sets value for states. switch_bucket(bucket_key, data_shapes[, …]) It will switche to a different bucket. update() This method updates the given parameters according to the installed optimizer. It also updates the gradients computed in the previous forward-backward batch. update_metric(eval_metric, labels[, pre_sliced]) This method, as name implies, evaluates and accumulates the evaluation metric on outputs of the last forward computation. Attributes Following table shows the attributes consisted in the methods of BaseModule class − Attributes Definition data_names It consists of the list of names for data required by this module. data_shapes It consists of the list of (name, shape) pairs specifying the data inputs to this module. label_shapes It shows the list of (name, shape) pairs specifying the label inputs to this module. output_names It consists of the list of names for the outputs of this module. output_shapes It consists of the list of (name, shape) pairs specifying the outputs of this module. symbol As name specified, this attribute gets the symbol associated with this module. data_shapes: You can refer the link available at for details. output_shapes: More output_shapes: More information is available at BucketingModule(sym_gen[…]) It represents the Bucketingmodule class of a Module which helps to deal efficiently with varying length inputs. Methods Following table shows the methods consisted in BucketingModule class − Attributes Following table shows the attributes consisted in the methods of BaseModule class − Attributes Definition data_names It consists of the list of names for data required by this module. data_shapes It consists of the list of (name, shape) pairs specifying the data inputs to this module. label_shapes It shows the list of (name, shape) pairs specifying the label inputs to this module. output_names It consists of the list of names for the outputs of this module. output_shapes It consists of the list of (name, shape) pairs specifying the outputs of this module. Symbol As name specified,
Apache MXNet – Quick Guide Apache MXNet – Introduction This chapter highlights the features of Apache MXNet and talks about the latest version of this deep learning software framework. What is MXNet? Apache MXNet is a powerful open-source deep learning software framework instrument helping developers build, train, and deploy Deep Learning models. Past few years, from healthcare to transportation to manufacturing and, in fact, in every aspect of our daily life, the impact of deep learning has been widespread. Nowadays, deep learning is sought by companies to solve some hard problems like Face recognition, object detection, Optical Character Recognition (OCR), Speech Recognition, and Machine Translation. That’s the reason Apache MXNet is supported by: Some big companies like Intel, Baidu, Microsoft, Wolfram Research, etc. Public cloud providers including Amazon Web Services (AWS), and Microsoft Azure Some big research institutes like Carnegie Mellon, MIT, the University of Washington, and the Hong Kong University of Science & Technology. Why Apache MXNet? There are various deep learning platforms like Torch7, Caffe, Theano, TensorFlow, Keras, Microsoft Cognitive Toolkit, etc. existed then you might wonder why Apache MXNet? Let’s check out some of the reasons behind it: Apache MXNet solves one of the biggest issues of existing deep learning platforms. The issue is that in order to use deep learning platforms one must need to learn another system for a different programming flavor. With the help of Apache MXNet developers can exploit the full capabilities of GPUs as well as cloud computing. Apache MXNet can accelerate any numerical computation and places a special emphasis on speeding up the development and deployment of large-scale DNN (deep neural networks). It provides the users the capabilities of both imperative and symbolic programming. Various Features If you are looking for a flexible deep learning library to quickly develop cutting-edge deep learning research or a robust platform to push production workload, your search ends at Apache MXNet. It is because of the following features of it: Distributed Training Whether it is multi-gpu or multi-host training with near-linear scaling efficiency, Apache MXNet allows developers to make most out of their hardware. MXNet also support integration with Horovod, which is an open source distributed deep learning framework created at Uber. For this integration, following are some of the common distributed APIs defined in Horovod: horovod.broadcast() horovod.allgather() horovod.allgather() In this regard, MXNet offer us the following capabilities: Device Placement − With the help of MXNet we can easily specify each data structure (DS). Automatic Differentiation − Apache MXNet automates the differentiation i.e. derivative calculations. Multi-GPU training − MXNet allows us to achieve scaling efficiency with number of available GPUs. Optimized Predefined Layers − We can code our own layers in MXNet as well as the optimized the predefined layers for speed also. Hybridization Apache MXNet provides its users a hybrid front-end. With the help of the Gluon Python API it can bridge the gap between its imperative and symbolic capabilities. It can be done by calling it’s hybridize functionality. Faster Computation The linear operations like tens or hundreds of matrix multiplications are the computational bottleneck for deep neural nets. To solve this bottleneck MXNet provides − Optimized numerical computation for GPUs Optimized numerical computation for distributed ecosystems Automation of common workflows with the help of which the standard NN can be expressed briefly. Language Bindings MXNet has deep integration into high-level languages like Python and R. It also provides support for other programming languages such as- Scala Julia Clojure Java C/C++ Perl We do not need to learn any new programming language instead MXNet, combined with hybridization feature, allows an exceptionally smooth transition from Python to deployment in the programming language of our choice. Latest version MXNet 1.6.0 Apache Software Foundation (ASF) has released the stable version 1.6.0 of Apache MXNet on 21st February 2020 under Apache License 2.0. This is the last MXNet release to support Python 2 as MXNet community voted to no longer support Python 2 in further releases. Let us check out some of the new features this release brings for its users. NumPy-Compatible interface Due to its flexibility and generality, NumPy has been widely used by Machine Learning practitioners, scientists, and students. But as we know that, these days’ hardware accelerators like Graphical Processing Units (GPUs) have become increasingly assimilated into various Machine Learning (ML) toolkits, the NumPy users, to take advantage of the speed of GPUs, need to switch to new frameworks with different syntax. With MXNet 1.6.0, Apache MXNet is moving toward a NumPy-compatible programming experience. The new interface provides equivalent usability as well as expressiveness to the practitioners familiar with NumPy syntax. Along with that MXNet 1.6.0 also enables the existing Numpy system to utilize hardware accelerators like GPUs to speed-up large-scale computations. Integration with Apache TVM Apache TVM, an open-source end-to-end deep learning compiler stack for hardware-backends such as CPUs, GPUs, and specialized accelerators, aims to fill the gap between the productivity-focused deep-learning frameworks and performance-oriented hardware backends. With the latest release MXNet 1.6.0, users can leverage Apache(incubating) TVM to implement high-performance operator kernels in Python programming language. Two main advantages of this new feature are following − Simplifies the former C++ based development process. Enables sharing the same implementation across multiple hardware backend such as CPUs, GPUs, etc. Improvements on existing features Apart from the above listed features of MXNet 1.6.0, it also provides some improvements over the existing features. The improvements are as follows − Grouping element-wise operation for GPU As we know the performance of element-wise operations is memory-bandwidth and that is the reason, chaining such operations may reduce overall performance. Apache MXNet 1.6.0 does element-wise operation fusion, that actually generates just-in-time fused operations as and when possible. Such element-wise operation fusion also reduces storage needs and improve overall performance. Simplifying common expressions MXNet 1.6.0 eliminates the redundant expressions and simplify the common expressions. Such enhancement also improves memory usage and total execution time. Optimizations MXNet 1.6.0 also provides various optimizations to existing features & operators, which are as follows: Automatic
Apache MXNet – System Components Here, the system components in Apache MXNet are explained in detail. First, we will study about the execution engine in MXNet. Execution Engine Apache MXNet’s execution engine is very versatile. We can use it for deep learning as well as any domain-specific problem: execute a bunch of functions following their dependencies. It is designed in such a way that the functions with dependencies are serialized whereas, the functions with no dependencies can be executed in parallel. Core Interface The API given below is the core interface for Apache MXNet’s execution engine − virtual void PushSync(Fn exec_fun, Context exec_ctx, std::vector<VarHandle> const& const_vars, std::vector<VarHandle> const& mutate_vars) = 0; The above API has the following − exec_fun − The core interface API of MXNet allows us to push the function named exec_fun, along with its context information and dependencies, to the execution engine. exec_ctx − The context information in which the above-mentioned function exec_fun should be executed. const_vars − These are the variables that the function reads from. mutate_vars − These are the variables that are to be modified. The execution engine provides its user the guarantee that the execution of any two functions that modify a common variable is serialized in their push order. Function Following is the function type of the execution engine of Apache MXNet − using Fn = std::function<void(RunContext)>; In the above function, RunContext contains the runtime information. The runtime information should be determined by the execution engine. The syntax of RunContext is as follows− struct RunContext { // stream pointer which could be safely cast to // cudaStream_t* type void *stream; }; Below are given some important points about execution engine’s functions − All the functions are executed by MXNet’s execution engine’s internal threads. It is not good to push blocking the function to the execution engine because with that the function will occupy the execution thread and will also reduce the total throughput. For this MXNet provides another asynchronous function as follows− using Callback = std::function<void()>; using AsyncFn = std::function<void(RunContext, Callback)>; In this AsyncFn function we can pass the heavy part of our threads, but the execution engine does not consider the function finished until we call the callback function. Context In Context, we can specify the context of the function to be executed within. This usually includes the following − Whether the function should be run on a CPU or a GPU. If we specify GPU in the Context, then which GPU to use. There is a huge difference between Context and RunContext. Context have the device type and device id, whereas RunContext have the information that can be decided only during runtime. VarHandle VarHandle, used to specify the dependencies of functions, is like a token (especially provided by execution engine) we can use to represents the external resources the function can modify or use. But the question arises, why we need to use VarHandle? It is because, the Apache MXNet engine is designed to decoupled from other MXNet modules. Following are some important points about VarHandle − It is lightweight so to create, delete, or copying a variable incurs little operating cost. We need to specify the immutable variables i.e. the variables that will be used in the const_vars. We need to specify the mutable variables i.e. the variables that will be modified in the mutate_vars. The rule used by the execution engine to resolve the dependencies among functions is that the execution of any two functions when one of them modifies at least one common variable is serialized in their push order. For creating a new variable, we can use the NewVar() API. For deleting a variable, we can use the PushDelete API. Let us understand its working with a simple example − Suppose if we have two functions namely F1 and F2 and they both mutate the variable namely V2. In that case, F2 is guaranteed to be executed after F1 if F2 is pushed after F1. On the other side, if F1 and F2 both use V2 then their actual execution order could be random. Push and Wait Push and wait are two more useful API of execution engine. Following are two important features of Push API: All the Push APIs are asynchronous which means that the API call immediately returns regardless of whether the pushed function is finished or not. Push API is not thread safe which means that only one thread should make engine API calls at a time. Now if we talk about Wait API, following points represent it − If a user wants to wait for a specific function to be finished, he/she should include a callback function in the closure. Once included, call the function at the end of the function. On the other hand, if a user wants to wait for all functions that involves a certain variable to finish, he/she should use WaitForVar(var) API. If someone wants to wait for all the pushed functions to finish, then use the WaitForAll () API. Used to specify the dependencies of functions, is like a token. Operators Operator in Apache MXNet is a class that contains actual computation logic as well as auxiliary information and aid the system in performing optimisation. Operator Interface Forward is the core operator interface whose syntax is as follows: virtual void Forward(const OpContext &ctx, const std::vector<TBlob> &in_data, const std::vector<OpReqType> &req, const std::vector<TBlob> &out_data, const std::vector<TBlob> &aux_states) = 0; The structure of OpContext, defined in Forward() is as follows: struct OpContext { int is_train; RunContext run_ctx; std::vector<Resource> requested; } The OpContext describes the state of operator (whether in the train or test phase), which device the operator should be run on and also the requested resources. two more useful API of execution engine. From the above Forward core interface, we can understand the requested resources as follows − in_data and out_data represent the input and output tensors. req denotes how the result of computation are written into the out_data. The OpReqType can be defined as − enum OpReqType
Apache MXNet – Toolkits and Ecosystem To support the research and development of Deep Learning applications across many fields, Apache MXNet provides us a rich ecosystem of toolkits, libraries and many more. Let us explore them − ToolKits Following are some of the most used and important toolkits provided by MXNet − GluonCV As name implies GluonCV is a Gluon toolkit for computer vision powered by MXNet. It provides implementation of state-of-the-art DL (Deep Learning) algorithms in computer vision (CV). With the help of GluonCV toolkit engineers, researchers, and students can validate new ideas and learn CV easily. Given below are some of the features of GluonCV − It trains scripts for reproducing state-of-the-art results reported in latest research. More than 170+ high quality pretrained models. Embrace flexible development pattern. GluonCV is easy to optimize. We can deploy it without retaining heavy weight DL framework. It provides carefully designed APIs that greatly lessen the implementation intricacy. Community support. Easy to understand implementations. Following are the supported applications by GluonCV toolkit: Image Classification Object Detection Semantic Segmentation Instance Segmentation Pose Estimation Video Action Recognition We can install GluonCV by using pip as follows − pip install –upgrade mxnet gluoncv GluonNLP As name implies GluonNLP is a Gluon toolkit for Natural Language Processing (NLP) powered by MXNet. It provides implementation of state-of-the-art DL (Deep Learning) models in NLP. With the help of GluonNLP toolkit engineers, researchers, and students can build blocks for text data pipelines and models. Based on these models, they can quickly prototype the research ideas and product. Given below are some of the features of GluonNLP: It trains scripts for reproducing state-of-the-art results reported in latest research. Set of pretrained models for common NLP tasks. It provides carefully designed APIs that greatly lessen the implementation intricacy. Community support. It also provides tutorials to help you get started on new NLP tasks. Following are the NLP tasks we can implement with GluonNLP toolkit − Word Embedding Language Model Machine Translation Text Classification Sentiment Analysis Natural Language Inference Text Generation Dependency Parsing Named Entity Recognition Intent Classification and Slot Labeling We can install GluonNLP by using pip as follows − pip install –upgrade mxnet gluonnlp GluonTS As name implies GluonTS is a Gluon toolkit for Probabilistic Time Series Modeling powered by MXNet. It provides the following features − State-of-the-art (SOTA) deep learning models ready to be trained. The utilities for loading as well as iterating over time-series datasets. Building blocks to define your own model. With the help of GluonTS toolkit engineers, researchers, and students can train and evaluate any of the built-in models on their own data, quickly experiment with different solutions, and come up with a solution for their time series tasks. They can also use the provided abstractions and building blocks to create custom time series models, and rapidly benchmark them against baseline algorithms. We can install GluonTS by using pip as follows − pip install gluonts GluonFR As name implies, it is an Apache MXNet Gluon toolkit for FR (Face Recognition). It provides the following features − State-of-the-art (SOTA) deep learning models in face recognition. The implementation of SoftmaxCrossEntropyLoss, ArcLoss, TripletLoss, RingLoss, CosLoss/AMsoftmax, L2-Softmax, A-Softmax, CenterLoss, ContrastiveLoss, and LGM Loss, etc. In order to install Gluon Face, we need Python 3.5 or later. We also first need to install GluonCV and MXNet first as follows − pip install gluoncv –pre pip install mxnet-mkl –pre –upgrade pip install mxnet-cuXXmkl –pre –upgrade # if cuda XX is installed Once you installed the dependencies, you can use the following command to install GluonFR − From Source pip install git+https://github.com/THUFutureLab/gluon-face.git@master Pip pip install gluonfr Ecosystem Now let us explore MXNet’s rich libraries, packages, and frameworks − Coach RL Coach, a Python Reinforcement Learning (RL) framework created by Intel AI lab. It enables easy experimentation with State-of-the-art RL algorithms. Coach RL supports Apache MXNet as a back end and allows simple integration of new environment to solve. In order to extend and reuse existing components easily, Coach RL very well decoupled the basic reinforcement learning components such as algorithms, environments, NN architectures, exploration policies. Following are the agents and supported algorithms for Coach RL framework − Value Optimization Agents Deep Q Network (DQN) Double Deep Q Network (DDQN) Dueling Q Network Mixed Monte Carlo (MMC) Persistent Advantage Learning (PAL) Categorical Deep Q Network (C51) Quantile Regression Deep Q Network (QR-DQN) N-Step Q Learning Neural Episodic Control (NEC) Normalized Advantage Functions (NAF) Rainbow Policy Optimization Agents Policy Gradients (PG) Asynchronous Advantage Actor-Critic (A3C) Deep Deterministic Policy Gradients (DDPG) Proximal Policy Optimization (PPO) Clipped Proximal Policy Optimization (CPPO) Generalized Advantage Estimation (GAE) Sample Efficient Actor-Critic with Experience Replay (ACER) Soft Actor-Critic (SAC) Twin Delayed Deep Deterministic Policy Gradient (TD3) General Agents Direct Future Prediction (DFP) Imitation Learning Agents Behavioral Cloning (BC) Conditional Imitation Learning Hierarchical Reinforcement Learning Agents Hierarchical Actor Critic (HAC) Deep Graph Library Deep Graph Library (DGL), developed by NYU and AWS teams, Shanghai, is a Python package that provides easy implementations of Graph Neural Networks (GNNs) on top of MXNet. It also provides easy implementation of GNNs on top of other existing major deep learning libraries like PyTorch, Gluon, etc. Deep Graph Library is a free software. It is available on all Linux distributions later than Ubuntu 16.04, macOS X, and Windows 7 or later. It also requires the Python 3.5 version or later. Following are the features of DGL − No Migration cost − There is no migration cost for using DGL as it is built on top of popular exiting DL frameworks. Message Passing − DGL provides message passing and it has versatile control over it. The message passing ranges from low-level operations such as sending along selected edges to high-level control such as graph-wide feature updates. Smooth Learning Curve − It is quite easy to learn and use DGL as the powerful user-defined functions are flexible as well as easy to use. Transparent Speed Optimization − DGL provides transparent speed optimization by doing automatic batching of computations
Apache MXNet – Python API ndarray This chapter explains the ndarray library which is available in Apache MXNet. Mxnet.ndarray Apache MXNet’s NDArray library defines the core DS (data structures) for all the mathematical computations. Two fundamental jobs of NDArray are as follows − It supports fast execution on a wide range of hardware configurations. It automatically parallelises multiple operations across available hardware. The example given below shows how one can create an NDArray by using 1-D and 2-D ‘array’ from a regular Python list − import mxnet as mx from mxnet import nd x = nd.array([1,2,3,4,5,6,7,8,9,10]) print(x) Output The output is given below: [ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.] <NDArray 10 @cpu(0)> Example y = nd.array([[1,2,3,4,5,6,7,8,9,10], [1,2,3,4,5,6,7,8,9,10], [1,2,3,4,5,6,7,8,9,10]]) print(y) Output This produces the following output − [[ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.] [ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.] [ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.]] <NDArray 3×10 @cpu(0)> Now let us discuss in detail about the classes, functions, and parameters of ndarray API of MXNet. Classes Following table consists of the classes of ndarray API of MXNet − Class Definition CachedOp(sym[, flags]) It is used for Cached operator handle. NDArray(handle[, writable]) It is used as an array object that represents a multi-dimensional, homogeneous array of fixed-size items. Functions and their parameters Following are some of the important functions and their parameters covered by mxnet.ndarray API − Function & its Parameters Definition Activation([data, act_type, out, name]) It applies an activation function element-wise to the input. It supports relu, sigmoid, tanh, softrelu, softsign activation functions. BatchNorm([data, gamma, beta, moving_mean, …]) It is used for batch normalisation. This function normalises a data batch by mean and variance. It applies a scale gamma and offset beta. BilinearSampler([data, grid, cudnn_off, …]) This function applies bilinear sampling to input feature map. Actually it is the key of “Spatial Transformer Networks”. If you are familiar with remap function in OpenCV, the usage of this function is quite similar to that. The only difference is that it has the backward pass. BlockGrad([data, out, name]) As name specifies, this function stops gradient computation. It basically stops the accumulated gradient of the inputs from flowing through this operator in backward direction. cast([data, dtype, out, name]) This function will cast all elements of the input to a new type. Implementation Examples In the example below, we will be using the function BilinierSampler() for zooming out the data two times and shifting the data horizontally by -1 pixel − import mxnet as mx from mxnet import nd data = nd.array([[[[2, 5, 3, 6], [1, 8, 7, 9], [0, 4, 1, 8], [2, 0, 3, 4]]]]) affine_matrix = nd.array([[2, 0, 0], [0, 2, 0]]) affine_matrix = nd.reshape(affine_matrix, shape=(1, 6)) grid = nd.GridGenerator(data=affine_matrix, transform_type=”affine”, target_shape=(4, 4)) output = nd.BilinearSampler(data, grid) Output When you execute the above code, you should see the following output: [[[[0. 0. 0. 0. ] [0. 4.0000005 6.25 0. ] [0. 1.5 4. 0. ] [0. 0. 0. 0. ]]]] <NDArray 1x1x4x4 @cpu(0)> The above output shows the zooming out of data two times. Example of shifting the data by -1 pixel is as follows − import mxnet as mx from mxnet import nd data = nd.array([[[[2, 5, 3, 6], [1, 8, 7, 9], [0, 4, 1, 8], [2, 0, 3, 4]]]]) warp_matrix = nd.array([[[[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]], [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]]]) grid = nd.GridGenerator(data=warp_matrix, transform_type=”warp”) output = nd.BilinearSampler(data, grid) Output The output is stated below − [[[[5. 3. 6. 0.] [8. 7. 9. 0.] [4. 1. 8. 0.] [0. 3. 4. 0.]]]] <NDArray 1x1x4x4 @cpu(0)> Similarly, following example shows the use of cast() function − nd.cast(nd.array([300, 10.1, 15.4, -1, -2]), dtype=”uint8”) Output Upon execution, you will receive the following output − [ 44 10 15 255 254] <NDArray 5 @cpu(0)> ndarray.contrib The Contrib NDArray API is defined in the ndarray.contrib package. It typically provides many useful experimental APIs for new features. This API works as a place for the community where they can try out the new features. The feature contributor will get the feedback as well. Functions and their parameters Following are some of the important functions and their parameters covered by mxnet.ndarray.contrib API − Function & its Parameters Definition rand_zipfian(true_classes, num_sampled, …) This function draws random samples from an approximately Zipfian distribution. The base distribution of this function is Zipfian distribution. This function randomly samples num_sampled candidates and the elements of sampled_candidates are drawn from the base distribution given above. foreach(body, data, init_states) As name implies, this function runs a for loop with user-defined computation over NDArrays on dimension 0. This function simulates a for loop and body has the computation for an iteration of the for loop. while_loop(cond, func, loop_vars[, …]) As name implies, this function runs a while loop with user-defined computation and loop condition. This function simulates a while loop that literately does customized computation if the condition is satisfied. cond(pred, then_func, else_func) As name implies, this function run an if-then-else using user-defined condition and computation. This function simulates an if-like branch which chooses to do one of the two customised computations according to the specified condition. isinf(data) This function performs an element-wise check to determine if the NDArray contains an infinite element or not. getnnz([data, axis, out, name]) This function gives us the number of stored values for a sparse tensor. It also includes explicit zeros. It only supports CSR matrix on CPU. requantize([data, min_range, max_range, …]) This function requantise the given data that is quantised in int32 and the corresponding thresholds, into int8 using min and max thresholds either calculated at runtime or from calibration. Implementation Examples In the example below, we will be using the function rand_zipfian for drawing random samples from an approximately Zipfian distribution − import mxnet as mx from mxnet import nd trueclass = mx.nd.array([2])