Apache MXNet – Gluon Another most important MXNet Python package is Gluon. In this chapter, we will be discussing this package. Gluon provides a clear, concise, and simple API for DL projects. It enables Apache MXNet to prototype, build, and train DL models without forfeiting the training speed. Blocks Blocks form the basis of more complex network designs. In a neural network, as the complexity of neural network increases, we need to move from designing single to entire layers of neurons. For example, NN design like ResNet-152 have a very fair degree of regularity by consisting of blocks of repeated layers. Example In the example given below, we will write code a simple block, namely block for a multilayer perceptron. from mxnet import nd from mxnet.gluon import nn x = nd.random.uniform(shape=(2, 20)) N_net = nn.Sequential() N_net.add(nn.Dense(256, activation=”relu”)) N_net.add(nn.Dense(10)) N_net.initialize() N_net(x) Output This produces the following output: [[ 0.09543004 0.04614332 -0.00286655 -0.07790346 -0.05130241 0.02942038 0.08696645 -0.0190793 -0.04122177 0.05088576] [ 0.0769287 0.03099706 0.00856576 -0.044672 -0.06926838 0.09132431 0.06786592 -0.06187843 -0.03436674 0.04234696]] <NDArray 2×10 @cpu(0)> Steps needed to go from defining layers to defining blocks of one or more layers − Step 1 − Block take the data as input. Step 2 − Now, blocks will store the state in the form of parameters. For example, in the above coding example the block contains two hidden layers and we need a place to store parameters for it. Step 3 − Next block will invoke the forward function to perform forward propagation. It is also called forward computation. As a part of first forward call, blocks initialize the parameters in a lazy fashion. Step 4 − At last the blocks will invoke backward function and calculate the gradient with reference to their input. Typically, this step is performed automatically. Sequential Block A sequential block is a special kind of block in which the data flows through a sequence of blocks. In this, each block applied to the output of one before with the first block being applied on the input data itself. Let us see how sequential class works − from mxnet import nd from mxnet.gluon import nn class MySequential(nn.Block): def __init__(self, **kwargs): super(MySequential, self).__init__(**kwargs) def add(self, block): self._children[block.name] = block def forward(self, x): for block in self._children.values(): x = block(x) return x x = nd.random.uniform(shape=(2, 20)) N_net = MySequential() N_net.add(nn.Dense(256, activation =”relu”)) N_net.add(nn.Dense(10)) N_net.initialize() N_net(x) Output The output is given herewith − [[ 0.09543004 0.04614332 -0.00286655 -0.07790346 -0.05130241 0.02942038 0.08696645 -0.0190793 -0.04122177 0.05088576] [ 0.0769287 0.03099706 0.00856576 -0.044672 -0.06926838 0.09132431 0.06786592 -0.06187843 -0.03436674 0.04234696]] <NDArray 2×10 @cpu(0)> Custom Block We can easily go beyond concatenation with sequential block as defined above. But, if we would like to make customisations then the Block class also provides us the required functionality. Block class has a model constructor provided in nn module. We can inherit that model constructor to define the model we want. In the following example, the MLP class overrides the __init__ and forward functions of the Block class. Let us see how it works. class MLP(nn.Block): def __init__(self, **kwargs): super(MLP, self).__init__(**kwargs) self.hidden = nn.Dense(256, activation=”relu”) # Hidden layer self.output = nn.Dense(10) # Output layer def forward(self, x): hidden_out = self.hidden(x) return self.output(hidden_out) x = nd.random.uniform(shape=(2, 20)) N_net = MLP() N_net.initialize() N_net(x) Output When you run the code, you will see the following output: [[ 0.07787763 0.00216403 0.01682201 0.03059879 -0.00702019 0.01668715 0.04822846 0.0039432 -0.09300035 -0.04494302] [ 0.08891078 -0.00625484 -0.01619131 0.0380718 -0.01451489 0.02006172 0.0303478 0.02463485 -0.07605448 -0.04389168]] <NDArray 2×10 @cpu(0)> Custom Layers Apache MXNet’s Gluon API comes with a modest number of pre-defined layers. But still at some point, we may find that a new layer is needed. We can easily add a new layer in Gluon API. In this section, we will see how we can create a new layer from scratch. The Simplest Custom Layer To create a new layer in Gluon API, we must have to create a class inherits from the Block class which provides the most basic functionality. We can inherit all the pre-defined layers from it directly or via other subclasses. For creating the new layer, the only instance method needed to be implemented is forward (self, x). This method defines, what exactly our layer is going to do during forward propagation. As discussed earlier also, the back-propagation pass for blocks will be done by Apache MXNet itself automatically. Example In the example below, we will be defining a new layer. We will also implement forward() method to normalise the input data by fitting it into a range of [0, 1]. from __future__ import print_function import mxnet as mx from mxnet import nd, gluon, autograd from mxnet.gluon.nn import Dense mx.random.seed(1) class NormalizationLayer(gluon.Block): def __init__(self): super(NormalizationLayer, self).__init__() def forward(self, x): return (x – nd.min(x)) / (nd.max(x) – nd.min(x)) x = nd.random.uniform(shape=(2, 20)) N_net = NormalizationLayer() N_net.initialize() N_net(x) Output On executing the above program, you will get the following result − [[0.5216355 0.03835821 0.02284337 0.5945146 0.17334817 0.69329053 0.7782702 1. 0.5508242 0. 0.07058554 0.3677264 0.4366546 0.44362497 0.7192635 0.37616986 0.6728799 0.7032008 0.46907538 0.63514024] [0.9157533 0.7667402 0.08980197 0.03593295 0.16176797 0.27679572 0.07331014 0.3905285 0.6513384 0.02713427 0.05523694 0.12147208 0.45582628 0.8139887 0.91629887 0.36665893 0.07873632 0.78268915 0.63404864 0.46638715]] <NDArray 2×20 @cpu(0)> Hybridisation It may be defined as a process used by Apache MXNet’s to create a symbolic graph of a forward computation. Hybridisation allows MXNet to upsurge the computation performance by optimising the computational symbolic graph. Rather than directly inheriting from Block, in fact, we may find that while implementing existing layers a block inherits from a HybridBlock. Following are the reasons for this − Allows us to write custom layers: HybridBlock allows us to write custom layers that can further be used in imperative and symbolic programming both. Increase computation performance− HybridBlock optimise the computational symbolic graph which allows MXNet to increase computation performance. Example In this example, we will be rewriting our example layer, created above, by using HybridBlock: class NormalizationHybridLayer(gluon.HybridBlock): def __init__(self): super(NormalizationHybridLayer, self).__init__() def hybrid_forward(self, F, x): return F.broadcast_div(F.broadcast_sub(x, F.min(x)), (F.broadcast_sub(F.max(x), F.min(x)))) layer_hybd = NormalizationHybridLayer() layer_hybd(nd.array([1, 2, 3, 4, 5, 6], ctx=mx.cpu())) Output The
Category: Apache Mxnet
Python API Autograd and Initializer This chapter deals with the autograd and initializer API in MXNet. mxnet.autograd This is MXNet’ autograd API for NDArray. It has the following class − Class: Function() It is used for customised differentiation in autograd. It can be written as mxnet.autograd.Function. If, for any reason, the user do not want to use the gradients that are computed by the default chain-rule, then he/she can use Function class of mxnet.autograd to customize differentiation for computation. It has two methods namely Forward() and Backward(). Let us understand the working of this class with the help of following points − First, we need to define our computation in the forward method. Then, we need to provide the customized differentiation in the backward method. Now during gradient computation, instead of user-defined backward function, mxnet.autograd will use the backward function defined by the user. We can also cast to numpy array and back for some operations in forward as well as backward. Example Before using the mxnet.autograd.function class, let’s define a stable sigmoid function with backward as well as forward methods as follows − class sigmoid(mx.autograd.Function): def forward(self, x): y = 1 / (1 + mx.nd.exp(-x)) self.save_for_backward(y) return y def backward(self, dy): y, = self.saved_tensors return dy * y * (1-y) Now, the function class can be used as follows − func = sigmoid() x = mx.nd.random.uniform(shape=(10,)) x.attach_grad() with mx.autograd.record(): m = func(x) m.backward() dx_grad = x.grad.asnumpy() dx_grad Output When you run the code, you will see the following output − array([0.21458015, 0.21291625, 0.23330082, 0.2361367 , 0.23086983, 0.24060014, 0.20326573, 0.21093895, 0.24968489, 0.24301809], dtype=float32) Methods and their parameters Following are the methods and their parameters of mxnet.autogard.function class − Methods and its Parameters Definition forward (heads[, head_grads, retain_graph, …]) This method is used for forward computation. backward(heads[, head_grads, retain_graph, …]) This method is used for backward computation. It computes the gradients of heads with respect to previously marked variables. This method takes as many inputs as forward’s output. It also returns as many NDArray’s as forward’s inputs. get_symbol(x) This method is used to retrieve recorded computation history as Symbol. grad(heads, variables[, head_grads, …]) This method computes the gradients of heads with respect to variables. Once computed, instead of storing into variable.grad, gradients will be returned as new NDArrays. is_recording() With the help of this method we can get status on recording and not recording. is_training() With the help of this method we can get status on training and predicting. mark_variables(variables, gradients[, grad_reqs]) This method will mark NDArrays as variables to compute gradient for autograd. This method is same as function .attach_grad() in a variable but the only difference is that with this call we can set the gradient to any value. pause([train_mode]) This method returns a scope context to be used in ‘with’ statement for codes which do not need gradients to be calculated. predict_mode() This method returns a scope context to be used in ‘with’ statement in which forward pass behavior is set to inference mode and that is without changing the recording states. record([train_mode]) It will return an autograd recording scope context to be used in ‘with’ statement and captures code which needs gradients to be calculated. set_recording(is_recording) Similar to is_recoring(), with the help of this method we can get status on recording and not recording. set_training(is_training) Similar to is_traininig(), with the help of this method we can set status to training or predicting. train_mode() This method will return a scope context to be used in ‘with’ statement in which forward pass behavior is set to training mode and that is without changing the recording states. Implementation Example In the below example, we will be using mxnet.autograd.grad() method to compute the gradient of head with respect to variables − x = mx.nd.ones((2,)) x.attach_grad() with mx.autograd.record(): z = mx.nd.elemwise_add(mx.nd.exp(x), x) dx_grad = mx.autograd.grad(z, [x], create_graph=True) dx_grad Output The output is mentioned below − [ [3.7182817 3.7182817] <NDArray 2 @cpu(0)>] We can use mxnet.autograd.predict_mode() method to return a scope to be used in ‘with’ statement − with mx.autograd.record(): y = model(x) with mx.autograd.predict_mode(): y = sampling(y) backward([y]) mxnet.intializer This is MXNet’ API for weigh initializer. It has the following classes − Classes and their parameters Following are the methods and their parameters of mxnet.autogard.function class: Classes and its Parameters Definition Bilinear() With the help of this class we can initialize weight for up-sampling layers. Constant(value) This class initializes the weights to a given value. The value can be a scalar as well as NDArray that matches the shape of the parameter to be set. FusedRNN(init, num_hidden, num_layers, mode) As name implies, this class initialize parameters for the fused Recurrent Neural Network (RNN) layers. InitDesc It acts as the descriptor for the initialization pattern. Initializer(**kwargs) This is the base class of an initializer. LSTMBias([forget_bias]) This class initialize all biases of an LSTMCell to 0.0 but except for the forget gate whose bias is set to a custom value. Load(param[, default_init, verbose]) This class initialize the variables by loading data from file or dictionary. MSRAPrelu([factor_type, slope]) As name implies, this class Initialize the weight according to a MSRA paper. Mixed(patterns, initializers) It initializes the parameters using multiple initializers. Normal([sigma]) Normal() class initializes weights with random values sampled from a normal distribution with a mean of zero and standard deviation (SD) of sigma. One() It initializes the weights of parameter to one. Orthogonal([scale, rand_type]) As name implies, this class initialize weight as orthogonal matrix. Uniform([scale]) It initializes weights with random values which is uniformly sampled from a given range. Xavier([rnd_type, factor_type, magnitude]) It actually returns an initializer that performs “Xavier” initialization for weights. Zero() It initializes the weights of parameter to zero. Implementation Example In the below example, we will be using mxnet.init.Normal() class create an initializer and retrieve its parameters − init = mx.init.Normal(0.8) init.dumps() Output The output is given below − ”[“normal”, {“sigma”: 0.8}]” Example init = mx.init.Xavier(factor_type=”in”, magnitude=2.45) init.dumps() Output The output is shown below − ”[“xavier”, {“rnd_type”: “uniform”, “factor_type”: “in”, “magnitude”: 2.45}]” In the below example, we will be
Apache MXNet – NDArray In this chapter, we will be discussing about MXNet’s multi-dimensional array format called ndarray. Handling data with NDArray First, we are going see how we can handle data with NDArray. Following are the prerequisites for the same − Prerequisites To understand how we can handle data with this multi-dimensional array format, we need to fulfil the following prerequisites: MXNet installed in a Python environment Python 2.7.x or Python 3.x Implementation Example Let us understand the basic functionality with the help of an example given below − First, we need to import MXNet and ndarray from MXNet as follows − import mxnet as mx from mxnet import nd Once we import the necessary libraries, we will go with the following basic functionalities: A simple 1-D array with a python list Example x = nd.array([1,2,3,4,5,6,7,8,9,10]) print(x) Output The output is as mentioned below − [ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.] <NDArray 10 @cpu(0)> A 2-D array with a python list Example y = nd.array([[1,2,3,4,5,6,7,8,9,10], [1,2,3,4,5,6,7,8,9,10], [1,2,3,4,5,6,7,8,9,10]]) print(y) Output The output is as stated below − [[ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.] [ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.] [ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.]] <NDArray 3×10 @cpu(0)> Creating an NDArray without any initialisation Here, we will create a matrix with 3 rows and 4 columns by using .empty function. We will also use .full function, which will take an additional operator for what value you want to fill in the array. Example x = nd.empty((3, 4)) print(x) x = nd.full((3,4), 8) print(x) Output The output is given below − [[0.000e+00 0.000e+00 0.000e+00 0.000e+00] [0.000e+00 0.000e+00 2.887e-42 0.000e+00] [0.000e+00 0.000e+00 0.000e+00 0.000e+00]] <NDArray 3×4 @cpu(0)> [[8. 8. 8. 8.] [8. 8. 8. 8.] [8. 8. 8. 8.]] <NDArray 3×4 @cpu(0)> Matrix of all zeros with the .zeros function Example x = nd.zeros((3, 8)) print(x) Output The output is as follows − [[0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0.]] <NDArray 3×8 @cpu(0)> Matrix of all ones with the .ones function Example x = nd.ones((3, 8)) print(x) Output The output is mentioned below − [[1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1.]] <NDArray 3×8 @cpu(0)> Creating array whose values are sampled randomly Example y = nd.random_normal(0, 1, shape=(3, 4)) print(y) Output The output is given below − [[ 1.2673576 -2.0345826 -0.32537818 -1.4583491 ] [-0.11176403 1.3606371 -0.7889914 -0.17639421] [-0.2532185 -0.42614475 -0.12548696 1.4022992 ]] <NDArray 3×4 @cpu(0)> Finding dimension of each NDArray Example y.shape Output The output is as follows − (3, 4) Finding the size of each NDArray Example y.size Output 12 Finding the datatype of each NDArray Example y.dtype Output numpy.float32 NDArray Operations In this section, we will introduce you to MXNet’s array operations. NDArray support large number of standard mathematical as well as In-place operations. Standard Mathematical Operations Following are standard mathematical operations supported by NDArray − Element-wise addition First, we need to import MXNet and ndarray from MXNet as follows: import mxnet as mx from mxnet import nd x = nd.ones((3, 5)) y = nd.random_normal(0, 1, shape=(3, 5)) print(”x=”, x) print(”y=”, y) x = x + y print(”x = x + y, x=”, x) Output The output is given herewith − x= [[1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.]] <NDArray 3×5 @cpu(0)> y= [[-1.0554522 -1.3118273 -0.14674698 0.641493 -0.73820823] [ 2.031364 0.5932667 0.10228804 1.179526 -0.5444829 ] [-0.34249446 1.1086396 1.2756858 -1.8332436 -0.5289873 ]] <NDArray 3×5 @cpu(0)> x = x + y, x= [[-0.05545223 -0.3118273 0.853253 1.6414931 0.26179177] [ 3.031364 1.5932667 1.102288 2.1795259 0.4555171 ] [ 0.6575055 2.1086397 2.2756858 -0.8332436 0.4710127 ]] <NDArray 3×5 @cpu(0)> Element-wise multiplication Example x = nd.array([1, 2, 3, 4]) y = nd.array([2, 2, 2, 1]) x * y Output You will see the following output− [2. 4. 6. 4.] <NDArray 4 @cpu(0)> Exponentiation Example nd.exp(x) Output When you run the code, you will see the following output: [ 2.7182817 7.389056 20.085537 54.59815 ] <NDArray 4 @cpu(0)> Matrix transpose to compute matrix-matrix product Example nd.dot(x, y.T) Output Given below is the output of the code − [16.] <NDArray 1 @cpu(0)> In-place Operations Every time, in the above example, we ran an operation, we allocated a new memory to host its result. For example, if we write A = A+B, we will dereference the matrix that A used to point to and instead point it at the newly allocated memory. Let us understand it with the example given below, using Python’s id() function − print(”y=”, y) print(”id(y):”, id(y)) y = y + x print(”after y=y+x, y=”, y) print(”id(y):”, id(y)) Output Upon execution, you will receive the following output − y= [2. 2. 2. 1.] <NDArray 4 @cpu(0)> id(y): 2438905634376 after y=y+x, y= [3. 4. 5. 5.] <NDArray 4 @cpu(0)> id(y): 2438905685664 In fact, we can also assign the result to a previously allocated array as follows − print(”x=”, x) z = nd.zeros_like(x) print(”z is zeros_like x, z=”, z) print(”id(z):”, id(z)) print(”y=”, y) z[:] = x + y print(”z[:] = x + y, z=”, z) print(”id(z) is the same as before:”, id(z)) Output The output is shown below − x= [1. 2. 3. 4.] <NDArray 4 @cpu(0)> z is zeros_like x, z= [0. 0. 0. 0.] <NDArray 4 @cpu(0)> id(z): 2438905790760 y= [3. 4. 5. 5.] <NDArray 4 @cpu(0)> z[:] = x + y, z= [4. 6. 8. 9.] <NDArray 4 @cpu(0)> id(z) is the same as before: 2438905790760 From the above output, we can see that x+y will still allocate a temporary buffer to store the result before copying it to z. So now, we can perform operations in-place to make better use of memory and to avoid temporary buffer. To do this, we will specify the out keyword argument every operator support as follows − print(”x=”, x,
Apache MXNet – System Components Here, the system components in Apache MXNet are explained in detail. First, we will study about the execution engine in MXNet. Execution Engine Apache MXNet’s execution engine is very versatile. We can use it for deep learning as well as any domain-specific problem: execute a bunch of functions following their dependencies. It is designed in such a way that the functions with dependencies are serialized whereas, the functions with no dependencies can be executed in parallel. Core Interface The API given below is the core interface for Apache MXNet’s execution engine − virtual void PushSync(Fn exec_fun, Context exec_ctx, std::vector<VarHandle> const& const_vars, std::vector<VarHandle> const& mutate_vars) = 0; The above API has the following − exec_fun − The core interface API of MXNet allows us to push the function named exec_fun, along with its context information and dependencies, to the execution engine. exec_ctx − The context information in which the above-mentioned function exec_fun should be executed. const_vars − These are the variables that the function reads from. mutate_vars − These are the variables that are to be modified. The execution engine provides its user the guarantee that the execution of any two functions that modify a common variable is serialized in their push order. Function Following is the function type of the execution engine of Apache MXNet − using Fn = std::function<void(RunContext)>; In the above function, RunContext contains the runtime information. The runtime information should be determined by the execution engine. The syntax of RunContext is as follows− struct RunContext { // stream pointer which could be safely cast to // cudaStream_t* type void *stream; }; Below are given some important points about execution engine’s functions − All the functions are executed by MXNet’s execution engine’s internal threads. It is not good to push blocking the function to the execution engine because with that the function will occupy the execution thread and will also reduce the total throughput. For this MXNet provides another asynchronous function as follows− using Callback = std::function<void()>; using AsyncFn = std::function<void(RunContext, Callback)>; In this AsyncFn function we can pass the heavy part of our threads, but the execution engine does not consider the function finished until we call the callback function. Context In Context, we can specify the context of the function to be executed within. This usually includes the following − Whether the function should be run on a CPU or a GPU. If we specify GPU in the Context, then which GPU to use. There is a huge difference between Context and RunContext. Context have the device type and device id, whereas RunContext have the information that can be decided only during runtime. VarHandle VarHandle, used to specify the dependencies of functions, is like a token (especially provided by execution engine) we can use to represents the external resources the function can modify or use. But the question arises, why we need to use VarHandle? It is because, the Apache MXNet engine is designed to decoupled from other MXNet modules. Following are some important points about VarHandle − It is lightweight so to create, delete, or copying a variable incurs little operating cost. We need to specify the immutable variables i.e. the variables that will be used in the const_vars. We need to specify the mutable variables i.e. the variables that will be modified in the mutate_vars. The rule used by the execution engine to resolve the dependencies among functions is that the execution of any two functions when one of them modifies at least one common variable is serialized in their push order. For creating a new variable, we can use the NewVar() API. For deleting a variable, we can use the PushDelete API. Let us understand its working with a simple example − Suppose if we have two functions namely F1 and F2 and they both mutate the variable namely V2. In that case, F2 is guaranteed to be executed after F1 if F2 is pushed after F1. On the other side, if F1 and F2 both use V2 then their actual execution order could be random. Push and Wait Push and wait are two more useful API of execution engine. Following are two important features of Push API: All the Push APIs are asynchronous which means that the API call immediately returns regardless of whether the pushed function is finished or not. Push API is not thread safe which means that only one thread should make engine API calls at a time. Now if we talk about Wait API, following points represent it − If a user wants to wait for a specific function to be finished, he/she should include a callback function in the closure. Once included, call the function at the end of the function. On the other hand, if a user wants to wait for all functions that involves a certain variable to finish, he/she should use WaitForVar(var) API. If someone wants to wait for all the pushed functions to finish, then use the WaitForAll () API. Used to specify the dependencies of functions, is like a token. Operators Operator in Apache MXNet is a class that contains actual computation logic as well as auxiliary information and aid the system in performing optimisation. Operator Interface Forward is the core operator interface whose syntax is as follows: virtual void Forward(const OpContext &ctx, const std::vector<TBlob> &in_data, const std::vector<OpReqType> &req, const std::vector<TBlob> &out_data, const std::vector<TBlob> &aux_states) = 0; The structure of OpContext, defined in Forward() is as follows: struct OpContext { int is_train; RunContext run_ctx; std::vector<Resource> requested; } The OpContext describes the state of operator (whether in the train or test phase), which device the operator should be run on and also the requested resources. two more useful API of execution engine. From the above Forward core interface, we can understand the requested resources as follows − in_data and out_data represent the input and output tensors. req denotes how the result of computation are written into the out_data. The OpReqType can be defined as − enum OpReqType
Apache MXNet – Toolkits and Ecosystem To support the research and development of Deep Learning applications across many fields, Apache MXNet provides us a rich ecosystem of toolkits, libraries and many more. Let us explore them − ToolKits Following are some of the most used and important toolkits provided by MXNet − GluonCV As name implies GluonCV is a Gluon toolkit for computer vision powered by MXNet. It provides implementation of state-of-the-art DL (Deep Learning) algorithms in computer vision (CV). With the help of GluonCV toolkit engineers, researchers, and students can validate new ideas and learn CV easily. Given below are some of the features of GluonCV − It trains scripts for reproducing state-of-the-art results reported in latest research. More than 170+ high quality pretrained models. Embrace flexible development pattern. GluonCV is easy to optimize. We can deploy it without retaining heavy weight DL framework. It provides carefully designed APIs that greatly lessen the implementation intricacy. Community support. Easy to understand implementations. Following are the supported applications by GluonCV toolkit: Image Classification Object Detection Semantic Segmentation Instance Segmentation Pose Estimation Video Action Recognition We can install GluonCV by using pip as follows − pip install –upgrade mxnet gluoncv GluonNLP As name implies GluonNLP is a Gluon toolkit for Natural Language Processing (NLP) powered by MXNet. It provides implementation of state-of-the-art DL (Deep Learning) models in NLP. With the help of GluonNLP toolkit engineers, researchers, and students can build blocks for text data pipelines and models. Based on these models, they can quickly prototype the research ideas and product. Given below are some of the features of GluonNLP: It trains scripts for reproducing state-of-the-art results reported in latest research. Set of pretrained models for common NLP tasks. It provides carefully designed APIs that greatly lessen the implementation intricacy. Community support. It also provides tutorials to help you get started on new NLP tasks. Following are the NLP tasks we can implement with GluonNLP toolkit − Word Embedding Language Model Machine Translation Text Classification Sentiment Analysis Natural Language Inference Text Generation Dependency Parsing Named Entity Recognition Intent Classification and Slot Labeling We can install GluonNLP by using pip as follows − pip install –upgrade mxnet gluonnlp GluonTS As name implies GluonTS is a Gluon toolkit for Probabilistic Time Series Modeling powered by MXNet. It provides the following features − State-of-the-art (SOTA) deep learning models ready to be trained. The utilities for loading as well as iterating over time-series datasets. Building blocks to define your own model. With the help of GluonTS toolkit engineers, researchers, and students can train and evaluate any of the built-in models on their own data, quickly experiment with different solutions, and come up with a solution for their time series tasks. They can also use the provided abstractions and building blocks to create custom time series models, and rapidly benchmark them against baseline algorithms. We can install GluonTS by using pip as follows − pip install gluonts GluonFR As name implies, it is an Apache MXNet Gluon toolkit for FR (Face Recognition). It provides the following features − State-of-the-art (SOTA) deep learning models in face recognition. The implementation of SoftmaxCrossEntropyLoss, ArcLoss, TripletLoss, RingLoss, CosLoss/AMsoftmax, L2-Softmax, A-Softmax, CenterLoss, ContrastiveLoss, and LGM Loss, etc. In order to install Gluon Face, we need Python 3.5 or later. We also first need to install GluonCV and MXNet first as follows − pip install gluoncv –pre pip install mxnet-mkl –pre –upgrade pip install mxnet-cuXXmkl –pre –upgrade # if cuda XX is installed Once you installed the dependencies, you can use the following command to install GluonFR − From Source pip install git+https://github.com/THUFutureLab/gluon-face.git@master Pip pip install gluonfr Ecosystem Now let us explore MXNet’s rich libraries, packages, and frameworks − Coach RL Coach, a Python Reinforcement Learning (RL) framework created by Intel AI lab. It enables easy experimentation with State-of-the-art RL algorithms. Coach RL supports Apache MXNet as a back end and allows simple integration of new environment to solve. In order to extend and reuse existing components easily, Coach RL very well decoupled the basic reinforcement learning components such as algorithms, environments, NN architectures, exploration policies. Following are the agents and supported algorithms for Coach RL framework − Value Optimization Agents Deep Q Network (DQN) Double Deep Q Network (DDQN) Dueling Q Network Mixed Monte Carlo (MMC) Persistent Advantage Learning (PAL) Categorical Deep Q Network (C51) Quantile Regression Deep Q Network (QR-DQN) N-Step Q Learning Neural Episodic Control (NEC) Normalized Advantage Functions (NAF) Rainbow Policy Optimization Agents Policy Gradients (PG) Asynchronous Advantage Actor-Critic (A3C) Deep Deterministic Policy Gradients (DDPG) Proximal Policy Optimization (PPO) Clipped Proximal Policy Optimization (CPPO) Generalized Advantage Estimation (GAE) Sample Efficient Actor-Critic with Experience Replay (ACER) Soft Actor-Critic (SAC) Twin Delayed Deep Deterministic Policy Gradient (TD3) General Agents Direct Future Prediction (DFP) Imitation Learning Agents Behavioral Cloning (BC) Conditional Imitation Learning Hierarchical Reinforcement Learning Agents Hierarchical Actor Critic (HAC) Deep Graph Library Deep Graph Library (DGL), developed by NYU and AWS teams, Shanghai, is a Python package that provides easy implementations of Graph Neural Networks (GNNs) on top of MXNet. It also provides easy implementation of GNNs on top of other existing major deep learning libraries like PyTorch, Gluon, etc. Deep Graph Library is a free software. It is available on all Linux distributions later than Ubuntu 16.04, macOS X, and Windows 7 or later. It also requires the Python 3.5 version or later. Following are the features of DGL − No Migration cost − There is no migration cost for using DGL as it is built on top of popular exiting DL frameworks. Message Passing − DGL provides message passing and it has versatile control over it. The message passing ranges from low-level operations such as sending along selected edges to high-level control such as graph-wide feature updates. Smooth Learning Curve − It is quite easy to learn and use DGL as the powerful user-defined functions are flexible as well as easy to use. Transparent Speed Optimization − DGL provides transparent speed optimization by doing automatic batching of computations
Apache MXNet – Python API ndarray This chapter explains the ndarray library which is available in Apache MXNet. Mxnet.ndarray Apache MXNet’s NDArray library defines the core DS (data structures) for all the mathematical computations. Two fundamental jobs of NDArray are as follows − It supports fast execution on a wide range of hardware configurations. It automatically parallelises multiple operations across available hardware. The example given below shows how one can create an NDArray by using 1-D and 2-D ‘array’ from a regular Python list − import mxnet as mx from mxnet import nd x = nd.array([1,2,3,4,5,6,7,8,9,10]) print(x) Output The output is given below: [ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.] <NDArray 10 @cpu(0)> Example y = nd.array([[1,2,3,4,5,6,7,8,9,10], [1,2,3,4,5,6,7,8,9,10], [1,2,3,4,5,6,7,8,9,10]]) print(y) Output This produces the following output − [[ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.] [ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.] [ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.]] <NDArray 3×10 @cpu(0)> Now let us discuss in detail about the classes, functions, and parameters of ndarray API of MXNet. Classes Following table consists of the classes of ndarray API of MXNet − Class Definition CachedOp(sym[, flags]) It is used for Cached operator handle. NDArray(handle[, writable]) It is used as an array object that represents a multi-dimensional, homogeneous array of fixed-size items. Functions and their parameters Following are some of the important functions and their parameters covered by mxnet.ndarray API − Function & its Parameters Definition Activation([data, act_type, out, name]) It applies an activation function element-wise to the input. It supports relu, sigmoid, tanh, softrelu, softsign activation functions. BatchNorm([data, gamma, beta, moving_mean, …]) It is used for batch normalisation. This function normalises a data batch by mean and variance. It applies a scale gamma and offset beta. BilinearSampler([data, grid, cudnn_off, …]) This function applies bilinear sampling to input feature map. Actually it is the key of “Spatial Transformer Networks”. If you are familiar with remap function in OpenCV, the usage of this function is quite similar to that. The only difference is that it has the backward pass. BlockGrad([data, out, name]) As name specifies, this function stops gradient computation. It basically stops the accumulated gradient of the inputs from flowing through this operator in backward direction. cast([data, dtype, out, name]) This function will cast all elements of the input to a new type. Implementation Examples In the example below, we will be using the function BilinierSampler() for zooming out the data two times and shifting the data horizontally by -1 pixel − import mxnet as mx from mxnet import nd data = nd.array([[[[2, 5, 3, 6], [1, 8, 7, 9], [0, 4, 1, 8], [2, 0, 3, 4]]]]) affine_matrix = nd.array([[2, 0, 0], [0, 2, 0]]) affine_matrix = nd.reshape(affine_matrix, shape=(1, 6)) grid = nd.GridGenerator(data=affine_matrix, transform_type=”affine”, target_shape=(4, 4)) output = nd.BilinearSampler(data, grid) Output When you execute the above code, you should see the following output: [[[[0. 0. 0. 0. ] [0. 4.0000005 6.25 0. ] [0. 1.5 4. 0. ] [0. 0. 0. 0. ]]]] <NDArray 1x1x4x4 @cpu(0)> The above output shows the zooming out of data two times. Example of shifting the data by -1 pixel is as follows − import mxnet as mx from mxnet import nd data = nd.array([[[[2, 5, 3, 6], [1, 8, 7, 9], [0, 4, 1, 8], [2, 0, 3, 4]]]]) warp_matrix = nd.array([[[[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]], [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]]]) grid = nd.GridGenerator(data=warp_matrix, transform_type=”warp”) output = nd.BilinearSampler(data, grid) Output The output is stated below − [[[[5. 3. 6. 0.] [8. 7. 9. 0.] [4. 1. 8. 0.] [0. 3. 4. 0.]]]] <NDArray 1x1x4x4 @cpu(0)> Similarly, following example shows the use of cast() function − nd.cast(nd.array([300, 10.1, 15.4, -1, -2]), dtype=”uint8”) Output Upon execution, you will receive the following output − [ 44 10 15 255 254] <NDArray 5 @cpu(0)> ndarray.contrib The Contrib NDArray API is defined in the ndarray.contrib package. It typically provides many useful experimental APIs for new features. This API works as a place for the community where they can try out the new features. The feature contributor will get the feedback as well. Functions and their parameters Following are some of the important functions and their parameters covered by mxnet.ndarray.contrib API − Function & its Parameters Definition rand_zipfian(true_classes, num_sampled, …) This function draws random samples from an approximately Zipfian distribution. The base distribution of this function is Zipfian distribution. This function randomly samples num_sampled candidates and the elements of sampled_candidates are drawn from the base distribution given above. foreach(body, data, init_states) As name implies, this function runs a for loop with user-defined computation over NDArrays on dimension 0. This function simulates a for loop and body has the computation for an iteration of the for loop. while_loop(cond, func, loop_vars[, …]) As name implies, this function runs a while loop with user-defined computation and loop condition. This function simulates a while loop that literately does customized computation if the condition is satisfied. cond(pred, then_func, else_func) As name implies, this function run an if-then-else using user-defined condition and computation. This function simulates an if-like branch which chooses to do one of the two customised computations according to the specified condition. isinf(data) This function performs an element-wise check to determine if the NDArray contains an infinite element or not. getnnz([data, axis, out, name]) This function gives us the number of stored values for a sparse tensor. It also includes explicit zeros. It only supports CSR matrix on CPU. requantize([data, min_range, max_range, …]) This function requantise the given data that is quantised in int32 and the corresponding thresholds, into int8 using min and max thresholds either calculated at runtime or from calibration. Implementation Examples In the example below, we will be using the function rand_zipfian for drawing random samples from an approximately Zipfian distribution − import mxnet as mx from mxnet import nd trueclass = mx.nd.array([2])
Apache MXNet – Distributed Training This chapter is about the distributed training in Apache MXNet. Let us start by understanding what are the modes of computation in MXNet. Modes of Computation MXNet, a multi-language ML library, offers its users the following two modes of computation − Imperative mode This mode of computation exposes an interface like NumPy API. For example, in MXNet, use the following imperative code to construct a tensor of zeros on both CPU as well as GPU − import mxnet as mx tensor_cpu = mx.nd.zeros((100,), ctx=mx.cpu()) tensor_gpu= mx.nd.zeros((100,), ctx=mx.gpu(0)) As we see in the above code, MXNets specifies the location where to hold the tensor, either in CPU or GPU device. In above example, it is at location 0. MXNet achieve incredible utilisation of the device, because all the computations happen lazily instead of instantaneously. Symbolic mode Although the imperative mode is quite useful, but one of the drawbacks of this mode is its rigidity, i.e. all the computations need to be known beforehand along with pre-defined data structures. On the other hand, Symbolic mode exposes a computation graph like TensorFlow. It removes the drawback of imperative API by allowing MXNet to work with symbols or variables instead of fixed/pre-defined data structures. Afterwards, the symbols can be interpreted as a set of operations as follows − import mxnet as mx x = mx.sym.Variable(“X”) y = mx.sym.Variable(“Y”) z = (x+y) m = z/100 Kinds of Parallelism Apache MXNet supports distributed training. It enables us to leverage multiple machines for faster as well as effective training. Following are the two ways in which, we can distribute the workload of training a NN across multiple devices, CPU or GPU device − Data Parallelism In this kind of parallelism, each device stores a complete copy of the model and works with a different part of the dataset. Devices also update a shared model collectively. We can locate all the devices on a single machine or across multiple machines. Model Parallelism It is another kind of parallelism, which comes handy when models are so large that they do not fit into device memory. In model parallelism, different devices are assigned the task of learning different parts of the model. The important point here to note is that currently Apache MXNet supports model parallelism in a single machine only. Working of distributed training The concepts given below are the key to understand the working of distributed training in Apache MXNet − Types of processes Processes communicates with each other to accomplish the training of a model. Apache MXNet has the following three processes − Worker The job of worker node is to perform training on a batch of training samples. The Worker nodes will pull weights from the server before processing every batch. The Worker nodes will send gradients to the server, once the batch is processed. Server MXNet can have multiple servers for storing the model’s parameters and to communicate with the worker nodes. Scheduler The role of the scheduler is to set up the cluster, which includes waiting for messages that each node has come up and which port the node is listening to. After setting up the cluster, the scheduler lets all the processes know about every other node in the cluster. It is because the processes can communicate with each other. There is only one scheduler. KV Store KV stores stands for Key-Value store. It is critical component used for multi-device training. It is important because, the communication of parameters across devices on single as well as across multiple machines is transmitted through one or more servers with a KVStore for the parameters. Let’s understand the working of KVStore with the help of following points − Each value in KVStore is represented by a key and a value. Each parameter array in the network is assigned a key and the weights of that parameter array is referred by value. After that, the worker nodes push gradients after processing a batch. They also pull updated weights before processing a new batch. The notion of KVStore server exists only during distributed training and the distributed mode of it is enabled by calling mxnet.kvstore.create function with a string argument containing the word dist − kv = mxnet.kvstore.create(‘dist_sync’) Distribution of Keys It is not necessary that, all the servers store all the parameters array or keys, but they are distributed across different servers. Such distribution of keys across different servers is handled transparently by the KVStore and the decision of which server stores a specific key is made at random. KVStore, as discussed above, ensures that whenever the key is pulled, its request is sent to that server, which has the corresponding value. What if the value of some key is large? In that case, it may be shared across different servers. Split training data As being the users, we want each machine to be working on different parts of the dataset, especially, when running distributed training in data parallel mode. We know that, to split a batch of samples provided by the data iterator for data parallel training on a single worker we can use mxnet.gluon.utils.split_and_load and then, load each part of the batch on the device which will process it further. On the other hand, in case of distributed training, at beginning we need to divide the dataset into n different parts so that every worker gets a different part. Once got, each worker can then use split_and_load to again divide that part of the dataset across different devices on a single machine. All this happen through data iterator. mxnet.io.MNISTIterator and mxnet.io.ImageRecordIter are two such iterators in MXNet that support this feature. Weights updating For updating the weights, KVStore supports following two modes − First method aggregates the gradients and updates the weights by using those gradients. In the second method the server only aggregates gradients. If you are using Gluon, there is an option to choose between above stated methods by passing update_on_kvstore variable. Let’s understand it by creating
Apache MXNet – Python Packages In this chapter we will learn about the Python Packages available in Apache MXNet. Important MXNet Python packages MXNet has the following important Python packages which we will be discussing one by one − Autograd (Automatic Differentiation) NDArray KVStore Gluon Visualization First let us start with Autograd Python package for Apache MXNet. Autograd Autograd stands for automatic differentiation used to backpropagate the gradients from the loss metric back to each of the parameters. Along with backpropagation it uses a dynamic programming approach to efficiently calculate the gradients. It is also called reverse mode automatic differentiation. This technique is very efficient in ‘fan-in’ situations where, many parameters effect a single loss metric. What are gradients? Gradients are the fundamentals to the process of neural network training. They basically tell us how to change the parameters of the network to improve its performance. As we know that, neural networks (NN) are composed of operators such as sums, product, convolutions, etc. These operators, for their computations, use parameters such as the weights in convolution kernels. We should have to find the optimal values for these parameters and gradients shows us the way and lead us to the solution as well. We are interested in the effect of changing a parameter on performance of the network and gradients tell us, how much a given variable increases or decreases when we change a variable it depends on. The performance is usually defined by using a loss metric that we try to minimise. For example, for regression we might try to minimise L2 loss between our predictions and exact value, whereas for classification we might minimise the cross-entropy loss. Once we calculate the gradient of each parameter with reference to the loss, we can then use an optimiser, such as stochastic gradient descent. How to calculate gradients? We have the following options to calculate gradients − Symbolic Differentiation − The very first option is Symbolic Differentiation, which calculates the formulas for each gradient. The drawback of this method is that, it will quickly lead to incredibly long formulas as the network get deeper and operators get more complex. Finite Differencing − Another option is, to use finite differencing which try slight differences on each parameter and see how the loss metric responds. The drawback of this method is that, it would be computationally expensive and may have poor numerical precision. Automatic differentiation − The solution to the drawbacks of the above methods is, to use automatic differentiation to backpropagate the gradients from the loss metric back to each of the parameters. Propagation allows us a dynamic programming approach to efficiently calculate the gradients. This method is also called reverse mode automatic differentiation. Automatic Differentiation (autograd) Here, we will understand in detail the working of autograd. It basically works in following two stages − Stage 1 − This stage is called ‘Forward Pass’ of training. As name implies, in this stage it creates the record of the operator used by the network to make predictions and calculate the loss metric. Stage 2 − This stage is called ‘Backward Pass’ of training. As name implies, in this stage it works backwards through this record. Going backwards, it evaluates the partial derivatives of each operator, all the way back to the network parameter. Advantages of autograd Following are the advantages of using Automatic Differentiation (autograd) − Flexible − Flexibility, that it gives us when defining our network, is one of the huge benefits of using autograd. We can change the operations on every iteration. These are called the dynamic graphs, which are much more complex to implement in frameworks requiring static graph. Autograd, even in such cases, will still be able to backpropagate the gradients correctly. Automatic − Autograd is automatic, i.e. the complexities of the backpropagation procedure are taken care of by it for you. We just need to specify what gradients we are interested in calculating. Efficient − Autogard calculates the gradients very efficiently. Can use native Python control flow operators − We can use the native Python control flow operators such as if condition and while loop. The autograd will still be able to backpropagate the gradients efficiently and correctly. Using autograd in MXNet Gluon Here, with the help of an example, we will see how we can use autograd in MXNet Gluon. Implementation Example In the following example, we will implement the regression model having two layers. After implementing, we will use autograd to automatically calculate the gradient of the loss with reference to each of the weight parameters − First import the autogrard and other required packages as follows − from mxnet import autograd import mxnet as mx from mxnet.gluon.nn import HybridSequential, Dense from mxnet.gluon.loss import L2Loss Now, we need to define the network as follows − N_net = HybridSequential() N_net.add(Dense(units=3)) N_net.add(Dense(units=1)) N_net.initialize() Now we need to define the loss as follows − loss_function = L2Loss() Next, we need to create the dummy data as follows − x = mx.nd.array([[0.5, 0.9]]) y = mx.nd.array([[1.5]]) Now, we are ready for our first forward pass through the network. We want autograd to record the computational graph so that we can calculate the gradients. For this, we need to run the network code in the scope of autograd.record context as follows − with autograd.record(): y_hat = N_net(x) loss = loss_function(y_hat, y) Now, we are ready for the backward pass, which we start by calling the backward method on the quantity of interest. The quatity of interest in our example is loss because we are trying to calculate the gradient of loss with reference to the parameters − loss.backward() Now, we have gradients for each parameter of the network, which will be used by the optimiser to update the parameter value for improved performance. Let’s check out the gradients of the 1st layer as follows − N_net[0].weight.grad() Output The output is as follows− [[-0.00470527 -0.00846948] [-0.03640365 -0.06552657] [ 0.00800354 0.01440637]] <NDArray 3×2 @cpu(0)> Complete implementation example Given below is the complete implementation
Apache MXNet – Introduction This chapter highlights the features of Apache MXNet and talks about the latest version of this deep learning software framework. What is MXNet? Apache MXNet is a powerful open-source deep learning software framework instrument helping developers build, train, and deploy Deep Learning models. Past few years, from healthcare to transportation to manufacturing and, in fact, in every aspect of our daily life, the impact of deep learning has been widespread. Nowadays, deep learning is sought by companies to solve some hard problems like Face recognition, object detection, Optical Character Recognition (OCR), Speech Recognition, and Machine Translation. That’s the reason Apache MXNet is supported by: Some big companies like Intel, Baidu, Microsoft, Wolfram Research, etc. Public cloud providers including Amazon Web Services (AWS), and Microsoft Azure Some big research institutes like Carnegie Mellon, MIT, the University of Washington, and the Hong Kong University of Science & Technology. Why Apache MXNet? There are various deep learning platforms like Torch7, Caffe, Theano, TensorFlow, Keras, Microsoft Cognitive Toolkit, etc. existed then you might wonder why Apache MXNet? Let’s check out some of the reasons behind it: Apache MXNet solves one of the biggest issues of existing deep learning platforms. The issue is that in order to use deep learning platforms one must need to learn another system for a different programming flavor. With the help of Apache MXNet developers can exploit the full capabilities of GPUs as well as cloud computing. Apache MXNet can accelerate any numerical computation and places a special emphasis on speeding up the development and deployment of large-scale DNN (deep neural networks). It provides the users the capabilities of both imperative and symbolic programming. Various Features If you are looking for a flexible deep learning library to quickly develop cutting-edge deep learning research or a robust platform to push production workload, your search ends at Apache MXNet. It is because of the following features of it: Distributed Training Whether it is multi-gpu or multi-host training with near-linear scaling efficiency, Apache MXNet allows developers to make most out of their hardware. MXNet also support integration with Horovod, which is an open source distributed deep learning framework created at Uber. For this integration, following are some of the common distributed APIs defined in Horovod: horovod.broadcast() horovod.allgather() horovod.allgather() In this regard, MXNet offer us the following capabilities: Device Placement − With the help of MXNet we can easily specify each data structure (DS). Automatic Differentiation − Apache MXNet automates the differentiation i.e. derivative calculations. Multi-GPU training − MXNet allows us to achieve scaling efficiency with number of available GPUs. Optimized Predefined Layers − We can code our own layers in MXNet as well as the optimized the predefined layers for speed also. Hybridization Apache MXNet provides its users a hybrid front-end. With the help of the Gluon Python API it can bridge the gap between its imperative and symbolic capabilities. It can be done by calling it’s hybridize functionality. Faster Computation The linear operations like tens or hundreds of matrix multiplications are the computational bottleneck for deep neural nets. To solve this bottleneck MXNet provides − Optimized numerical computation for GPUs Optimized numerical computation for distributed ecosystems Automation of common workflows with the help of which the standard NN can be expressed briefly. Language Bindings MXNet has deep integration into high-level languages like Python and R. It also provides support for other programming languages such as- Scala Julia Clojure Java C/C++ Perl We do not need to learn any new programming language instead MXNet, combined with hybridization feature, allows an exceptionally smooth transition from Python to deployment in the programming language of our choice. Latest version MXNet 1.6.0 Apache Software Foundation (ASF) has released the stable version 1.6.0 of Apache MXNet on 21st February 2020 under Apache License 2.0. This is the last MXNet release to support Python 2 as MXNet community voted to no longer support Python 2 in further releases. Let us check out some of the new features this release brings for its users. NumPy-Compatible interface Due to its flexibility and generality, NumPy has been widely used by Machine Learning practitioners, scientists, and students. But as we know that, these days’ hardware accelerators like Graphical Processing Units (GPUs) have become increasingly assimilated into various Machine Learning (ML) toolkits, the NumPy users, to take advantage of the speed of GPUs, need to switch to new frameworks with different syntax. With MXNet 1.6.0, Apache MXNet is moving toward a NumPy-compatible programming experience. The new interface provides equivalent usability as well as expressiveness to the practitioners familiar with NumPy syntax. Along with that MXNet 1.6.0 also enables the existing Numpy system to utilize hardware accelerators like GPUs to speed-up large-scale computations. Integration with Apache TVM Apache TVM, an open-source end-to-end deep learning compiler stack for hardware-backends such as CPUs, GPUs, and specialized accelerators, aims to fill the gap between the productivity-focused deep-learning frameworks and performance-oriented hardware backends. With the latest release MXNet 1.6.0, users can leverage Apache(incubating) TVM to implement high-performance operator kernels in Python programming language. Two main advantages of this new feature are following − Simplifies the former C++ based development process. Enables sharing the same implementation across multiple hardware backend such as CPUs, GPUs, etc. Improvements on existing features Apart from the above listed features of MXNet 1.6.0, it also provides some improvements over the existing features. The improvements are as follows − Grouping element-wise operation for GPU As we know the performance of element-wise operations is memory-bandwidth and that is the reason, chaining such operations may reduce overall performance. Apache MXNet 1.6.0 does element-wise operation fusion, that actually generates just-in-time fused operations as and when possible. Such element-wise operation fusion also reduces storage needs and improve overall performance. Simplifying common expressions MXNet 1.6.0 eliminates the redundant expressions and simplify the common expressions. Such enhancement also improves memory usage and total execution time. Optimizations MXNet 1.6.0 also provides various optimizations to existing features & operators, which are as follows: Automatic Mixed Precision Gluon Fit API