Apache MXNet – Python Packages In this chapter we will learn about the Python Packages available in Apache MXNet. Important MXNet Python packages MXNet has the following important Python packages which we will be discussing one by one − Autograd (Automatic Differentiation) NDArray KVStore Gluon Visualization First let us start with Autograd Python package for Apache MXNet. Autograd Autograd stands for automatic differentiation used to backpropagate the gradients from the loss metric back to each of the parameters. Along with backpropagation it uses a dynamic programming approach to efficiently calculate the gradients. It is also called reverse mode automatic differentiation. This technique is very efficient in ‘fan-in’ situations where, many parameters effect a single loss metric. What are gradients? Gradients are the fundamentals to the process of neural network training. They basically tell us how to change the parameters of the network to improve its performance. As we know that, neural networks (NN) are composed of operators such as sums, product, convolutions, etc. These operators, for their computations, use parameters such as the weights in convolution kernels. We should have to find the optimal values for these parameters and gradients shows us the way and lead us to the solution as well. We are interested in the effect of changing a parameter on performance of the network and gradients tell us, how much a given variable increases or decreases when we change a variable it depends on. The performance is usually defined by using a loss metric that we try to minimise. For example, for regression we might try to minimise L2 loss between our predictions and exact value, whereas for classification we might minimise the cross-entropy loss. Once we calculate the gradient of each parameter with reference to the loss, we can then use an optimiser, such as stochastic gradient descent. How to calculate gradients? We have the following options to calculate gradients − Symbolic Differentiation − The very first option is Symbolic Differentiation, which calculates the formulas for each gradient. The drawback of this method is that, it will quickly lead to incredibly long formulas as the network get deeper and operators get more complex. Finite Differencing − Another option is, to use finite differencing which try slight differences on each parameter and see how the loss metric responds. The drawback of this method is that, it would be computationally expensive and may have poor numerical precision. Automatic differentiation − The solution to the drawbacks of the above methods is, to use automatic differentiation to backpropagate the gradients from the loss metric back to each of the parameters. Propagation allows us a dynamic programming approach to efficiently calculate the gradients. This method is also called reverse mode automatic differentiation. Automatic Differentiation (autograd) Here, we will understand in detail the working of autograd. It basically works in following two stages − Stage 1 − This stage is called ‘Forward Pass’ of training. As name implies, in this stage it creates the record of the operator used by the network to make predictions and calculate the loss metric. Stage 2 − This stage is called ‘Backward Pass’ of training. As name implies, in this stage it works backwards through this record. Going backwards, it evaluates the partial derivatives of each operator, all the way back to the network parameter. Advantages of autograd Following are the advantages of using Automatic Differentiation (autograd) − Flexible − Flexibility, that it gives us when defining our network, is one of the huge benefits of using autograd. We can change the operations on every iteration. These are called the dynamic graphs, which are much more complex to implement in frameworks requiring static graph. Autograd, even in such cases, will still be able to backpropagate the gradients correctly. Automatic − Autograd is automatic, i.e. the complexities of the backpropagation procedure are taken care of by it for you. We just need to specify what gradients we are interested in calculating. Efficient − Autogard calculates the gradients very efficiently. Can use native Python control flow operators − We can use the native Python control flow operators such as if condition and while loop. The autograd will still be able to backpropagate the gradients efficiently and correctly. Using autograd in MXNet Gluon Here, with the help of an example, we will see how we can use autograd in MXNet Gluon. Implementation Example In the following example, we will implement the regression model having two layers. After implementing, we will use autograd to automatically calculate the gradient of the loss with reference to each of the weight parameters − First import the autogrard and other required packages as follows − from mxnet import autograd import mxnet as mx from mxnet.gluon.nn import HybridSequential, Dense from mxnet.gluon.loss import L2Loss Now, we need to define the network as follows − N_net = HybridSequential() N_net.add(Dense(units=3)) N_net.add(Dense(units=1)) N_net.initialize() Now we need to define the loss as follows − loss_function = L2Loss() Next, we need to create the dummy data as follows − x = mx.nd.array([[0.5, 0.9]]) y = mx.nd.array([[1.5]]) Now, we are ready for our first forward pass through the network. We want autograd to record the computational graph so that we can calculate the gradients. For this, we need to run the network code in the scope of autograd.record context as follows − with autograd.record(): y_hat = N_net(x) loss = loss_function(y_hat, y) Now, we are ready for the backward pass, which we start by calling the backward method on the quantity of interest. The quatity of interest in our example is loss because we are trying to calculate the gradient of loss with reference to the parameters − loss.backward() Now, we have gradients for each parameter of the network, which will be used by the optimiser to update the parameter value for improved performance. Let’s check out the gradients of the 1st layer as follows − N_net[0].weight.grad() Output The output is as follows− [[-0.00470527 -0.00846948] [-0.03640365 -0.06552657] [ 0.00800354 0.01440637]] <NDArray 3×2 @cpu(0)> Complete implementation example Given below is the complete implementation
Category: Apache Mxnet
Apache MXNet – Introduction This chapter highlights the features of Apache MXNet and talks about the latest version of this deep learning software framework. What is MXNet? Apache MXNet is a powerful open-source deep learning software framework instrument helping developers build, train, and deploy Deep Learning models. Past few years, from healthcare to transportation to manufacturing and, in fact, in every aspect of our daily life, the impact of deep learning has been widespread. Nowadays, deep learning is sought by companies to solve some hard problems like Face recognition, object detection, Optical Character Recognition (OCR), Speech Recognition, and Machine Translation. That’s the reason Apache MXNet is supported by: Some big companies like Intel, Baidu, Microsoft, Wolfram Research, etc. Public cloud providers including Amazon Web Services (AWS), and Microsoft Azure Some big research institutes like Carnegie Mellon, MIT, the University of Washington, and the Hong Kong University of Science & Technology. Why Apache MXNet? There are various deep learning platforms like Torch7, Caffe, Theano, TensorFlow, Keras, Microsoft Cognitive Toolkit, etc. existed then you might wonder why Apache MXNet? Let’s check out some of the reasons behind it: Apache MXNet solves one of the biggest issues of existing deep learning platforms. The issue is that in order to use deep learning platforms one must need to learn another system for a different programming flavor. With the help of Apache MXNet developers can exploit the full capabilities of GPUs as well as cloud computing. Apache MXNet can accelerate any numerical computation and places a special emphasis on speeding up the development and deployment of large-scale DNN (deep neural networks). It provides the users the capabilities of both imperative and symbolic programming. Various Features If you are looking for a flexible deep learning library to quickly develop cutting-edge deep learning research or a robust platform to push production workload, your search ends at Apache MXNet. It is because of the following features of it: Distributed Training Whether it is multi-gpu or multi-host training with near-linear scaling efficiency, Apache MXNet allows developers to make most out of their hardware. MXNet also support integration with Horovod, which is an open source distributed deep learning framework created at Uber. For this integration, following are some of the common distributed APIs defined in Horovod: horovod.broadcast() horovod.allgather() horovod.allgather() In this regard, MXNet offer us the following capabilities: Device Placement − With the help of MXNet we can easily specify each data structure (DS). Automatic Differentiation − Apache MXNet automates the differentiation i.e. derivative calculations. Multi-GPU training − MXNet allows us to achieve scaling efficiency with number of available GPUs. Optimized Predefined Layers − We can code our own layers in MXNet as well as the optimized the predefined layers for speed also. Hybridization Apache MXNet provides its users a hybrid front-end. With the help of the Gluon Python API it can bridge the gap between its imperative and symbolic capabilities. It can be done by calling it’s hybridize functionality. Faster Computation The linear operations like tens or hundreds of matrix multiplications are the computational bottleneck for deep neural nets. To solve this bottleneck MXNet provides − Optimized numerical computation for GPUs Optimized numerical computation for distributed ecosystems Automation of common workflows with the help of which the standard NN can be expressed briefly. Language Bindings MXNet has deep integration into high-level languages like Python and R. It also provides support for other programming languages such as- Scala Julia Clojure Java C/C++ Perl We do not need to learn any new programming language instead MXNet, combined with hybridization feature, allows an exceptionally smooth transition from Python to deployment in the programming language of our choice. Latest version MXNet 1.6.0 Apache Software Foundation (ASF) has released the stable version 1.6.0 of Apache MXNet on 21st February 2020 under Apache License 2.0. This is the last MXNet release to support Python 2 as MXNet community voted to no longer support Python 2 in further releases. Let us check out some of the new features this release brings for its users. NumPy-Compatible interface Due to its flexibility and generality, NumPy has been widely used by Machine Learning practitioners, scientists, and students. But as we know that, these days’ hardware accelerators like Graphical Processing Units (GPUs) have become increasingly assimilated into various Machine Learning (ML) toolkits, the NumPy users, to take advantage of the speed of GPUs, need to switch to new frameworks with different syntax. With MXNet 1.6.0, Apache MXNet is moving toward a NumPy-compatible programming experience. The new interface provides equivalent usability as well as expressiveness to the practitioners familiar with NumPy syntax. Along with that MXNet 1.6.0 also enables the existing Numpy system to utilize hardware accelerators like GPUs to speed-up large-scale computations. Integration with Apache TVM Apache TVM, an open-source end-to-end deep learning compiler stack for hardware-backends such as CPUs, GPUs, and specialized accelerators, aims to fill the gap between the productivity-focused deep-learning frameworks and performance-oriented hardware backends. With the latest release MXNet 1.6.0, users can leverage Apache(incubating) TVM to implement high-performance operator kernels in Python programming language. Two main advantages of this new feature are following − Simplifies the former C++ based development process. Enables sharing the same implementation across multiple hardware backend such as CPUs, GPUs, etc. Improvements on existing features Apart from the above listed features of MXNet 1.6.0, it also provides some improvements over the existing features. The improvements are as follows − Grouping element-wise operation for GPU As we know the performance of element-wise operations is memory-bandwidth and that is the reason, chaining such operations may reduce overall performance. Apache MXNet 1.6.0 does element-wise operation fusion, that actually generates just-in-time fused operations as and when possible. Such element-wise operation fusion also reduces storage needs and improve overall performance. Simplifying common expressions MXNet 1.6.0 eliminates the redundant expressions and simplify the common expressions. Such enhancement also improves memory usage and total execution time. Optimizations MXNet 1.6.0 also provides various optimizations to existing features & operators, which are as follows: Automatic Mixed Precision Gluon Fit API
Apache MXNet Tutorial Job Search Apache MXNet is a powerful open-source deep learning software framework instrument helping developers build, train, and deploy Deep Learning models. Past few years, from healthcare to transportation to manufacturing and, in fact, in every aspect of our daily life, the impact of deep learning has been widespread. Nowadays, deep learning is sought by companies to solve some hard problems like Face recognition, object detection, Optical Character Recognition (OCR), Speech Recognition, and Machine Translation. Audience This tutorial will be useful for graduates, post-graduates, and research students who either have an interest in the field of AI, Machine Learning and Deep Learning or have it as a part of their curriculum. The reader can be a beginner or an advanced learner. Prerequisites The reader must have basic knowledge about Artificial Intelligence. He/she should also be aware about Python language and its functions. If you are new to any of these concepts, we recommend you take up tutorials concerning these topics before you dig further into this tutorial.
Apache MXNet – Unified Operator API This chapter provides information about the unified operator application programming interface (API) in Apache MXNet. SimpleOp SimpleOp is a new unified operator API which unifies different invoking processes. Once invoked, it returns to the fundamental elements of operators. The unified operator is specially designed for unary as well as binary operations. It is because most of the mathematical operators attend to one or two operands and more operands make the optimization, related to dependency, useful. We will be understanding its SimpleOp unified operator working with the help of an example. In this example, we will be creating an operator functioning as a smooth l1 loss, which is a mixture of l1 and l2 loss. We can define and write the loss as given below − loss = outside_weight .* f(inside_weight .* (data – label)) grad = outside_weight .* inside_weight .* f”(inside_weight .* (data – label)) Here, in above example, .* stands for element-wise multiplication f, f’ is the smooth l1 loss function which we are assuming is in mshadow. It looks impossible to implement this particular loss as a unary or binary operator but MXNet provides its users automatic differentiation in symbolic execution which simplifies the loss to f and f’ directly. That’s why we can certainly implement this particular loss as a unary operator. Defining Shapes As we know MXNet’s mshadow library requires explicit memory allocation hence we need to provide all data shapes before any calculation occurs. Before defining functions and gradient, we need to provide input shape consistency and output shape as follows: typedef mxnet::TShape (*UnaryShapeFunction)(const mxnet::TShape& src, const EnvArguments& env); typedef mxnet::TShape (*BinaryShapeFunction)(const mxnet::TShape& lhs, const mxnet::TShape& rhs, const EnvArguments& env); The function mxnet::Tshape is used to check input data shape and designated output data shape. In case, if you do not define this function then the default output shape would be same as input shape. For example, in case of binary operator the shape of lhs and rhs is by default checked as the same. Now let’s move on to our smooth l1 loss example. For this, we need to define an XPU to cpu or gpu in the header implementation smooth_l1_unary-inl.h. The reason is to reuse the same code in smooth_l1_unary.cc and smooth_l1_unary.cu. #include <mxnet/operator_util.h> #if defined(__CUDACC__) #define XPU gpu #else #define XPU cpu #endif As in our smooth l1 loss example, the output has the same shape as the source, we can use the default behavior. It can be written as follows − inline mxnet::TShape SmoothL1Shape_(const mxnet::TShape& src,const EnvArguments& env) { return mxnet::TShape(src); } Defining Functions We can create a unary or binary function with one input as follows − typedef void (*UnaryFunction)(const TBlob& src, const EnvArguments& env, TBlob* ret, OpReqType req, RunContext ctx); typedef void (*BinaryFunction)(const TBlob& lhs, const TBlob& rhs, const EnvArguments& env, TBlob* ret, OpReqType req, RunContext ctx); Following is the RunContext ctx struct which contains the information needed during runtime for execution − struct RunContext { void *stream; // the stream of the device, can be NULL or Stream<gpu>* in GPU mode template<typename xpu> inline mshadow::Stream<xpu>* get_stream() // get mshadow stream from Context } // namespace mxnet Now, let’s see how we can write the computation results in ret. enum OpReqType { kNullOp, // no operation, do not write anything kWriteTo, // write gradient to provided space kWriteInplace, // perform an in-place write kAddTo // add to the provided space }; Now, let’s move on to our smooth l1 loss example. For this, we will use UnaryFunction to define the function of this operator as follows: template<typename xpu> void SmoothL1Forward_(const TBlob& src, const EnvArguments& env, TBlob *ret, OpReqType req, RunContext ctx) { using namespace mshadow; using namespace mshadow::expr; mshadow::Stream<xpu> *s = ctx.get_stream<xpu>(); real_t sigma2 = env.scalar * env.scalar; MSHADOW_TYPE_SWITCH(ret->type_flag_, DType, { mshadow::Tensor<xpu, 2, DType> out = ret->get<xpu, 2, DType>(s); mshadow::Tensor<xpu, 2, DType> in = src.get<xpu, 2, DType>(s); ASSIGN_DISPATCH(out, req, F<mshadow_op::smooth_l1_loss>(in, ScalarExp<DType>(sigma2))); }); } Defining Gradients Except Input, TBlob, and OpReqType are doubled, Gradients functions of binary operators have similar structure. Let’s check out below, where we created a gradient function with various types of input: // depending only on out_grad typedef void (*UnaryGradFunctionT0)(const OutputGrad& out_grad, const EnvArguments& env, TBlob* in_grad, OpReqType req, RunContext ctx); // depending only on out_value typedef void (*UnaryGradFunctionT1)(const OutputGrad& out_grad, const OutputValue& out_value, const EnvArguments& env, TBlob* in_grad, OpReqType req, RunContext ctx); // depending only on in_data typedef void (*UnaryGradFunctionT2)(const OutputGrad& out_grad, const Input0& in_data0, const EnvArguments& env, TBlob* in_grad, OpReqType req, RunContext ctx); As defined above Input0, Input, OutputValue, and OutputGrad all share the structure of GradientFunctionArgument. It is defined as follows − struct GradFunctionArgument { TBlob data; } Now let’s move on to our smooth l1 loss example. For this to enable the chain rule of gradient we need to multiply out_grad from the top to the result of in_grad. template<typename xpu> void SmoothL1BackwardUseIn_(const OutputGrad& out_grad, const Input0& in_data0, const EnvArguments& env, TBlob *in_grad, OpReqType req, RunContext ctx) { using namespace mshadow; using namespace mshadow::expr; mshadow::Stream<xpu> *s = ctx.get_stream<xpu>(); real_t sigma2 = env.scalar * env.scalar; MSHADOW_TYPE_SWITCH(in_grad->type_flag_, DType, { mshadow::Tensor<xpu, 2, DType> src = in_data0.data.get<xpu, 2, DType>(s); mshadow::Tensor<xpu, 2, DType> ograd = out_grad.data.get<xpu, 2, DType>(s); mshadow::Tensor<xpu, 2, DType> igrad = in_grad->get<xpu, 2, DType>(s); ASSIGN_DISPATCH(igrad, req, ograd * F<mshadow_op::smooth_l1_gradient>(src, ScalarExp<DType>(sigma2))); }); } Register SimpleOp to MXNet Once we created the shape, function, and gradient, we need to restore them into both an NDArray operator as well as into a symbolic operator. For this, we can use the registration macro as follows − MXNET_REGISTER_SIMPLE_OP(Name, DEV) .set_shape_function(Shape) .set_function(DEV::kDevMask, Function<XPU>, SimpleOpInplaceOption) .set_gradient(DEV::kDevMask, Gradient<XPU>, SimpleOpInplaceOption) .describe(“description”); The SimpleOpInplaceOption can be defined as follows − enum SimpleOpInplaceOption { kNoInplace, // do not allow inplace in arguments kInplaceInOut, // allow inplace in with out (unary) kInplaceOutIn, // allow inplace out_grad with in_grad (unary) kInplaceLhsOut, // allow inplace left operand with out (binary) kInplaceOutLhs // allow inplace out_grad with lhs_grad (binary) }; Now let’s move on to our smooth l1 loss example. For this, we have a gradient function that relies on input data so
Apache MXNet – Installing MXNet To get started with MXNet, the first thing we need to do, is to install it on our computer. Apache MXNet works on pretty much all the platforms available, including Windows, Mac, and Linux. Linux OS We can install MXNet on Linux OS in the following ways − Graphical Processing Unit (GPU) Here, we will use various methods namely Pip, Docker, and Source to install MXNet when we are using GPU for processing − By using Pip method You can use the following command to install MXNet on your Linus OS − pip install mxnet Apache MXNet also offers MKL pip packages, which are much faster when running on intel hardware. Here for example mxnet-cu101mkl means that − The package is built with CUDA/cuDNN The package is MKL-DNN enabled The CUDA version is 10.1 For other option you can also refer to . By using Docker You can find the docker images with MXNet at DockerHub, which is available at Let us check out the steps below to install MXNet by using Docker with GPU − Step 1− First, by following the docker installation instructions which are available at . We need to install Docker on our machine. Step 2− To enable the usage of GPUs from the docker containers, next we need to install nvidia-docker-plugin. You can follow the installation instructions given at . Step 3− By using the following command, you can pull the MXNet docker image − $ sudo docker pull mxnet/python:gpu Now in order to see if mxnet/python docker image pull was successful, we can list docker images as follows − $ sudo docker images For the fastest inference speeds with MXNet, it is recommended to use the latest MXNet with Intel MKL-DNN. Check the commands below − $ sudo docker pull mxnet/python:1.3.0_cpu_mkl $ sudo docker images From source To build the MXNet shared library from source with GPU, first we need to set up the environment for CUDA and cuDNN as follows− Download and install CUDA toolkit, here CUDA 9.2 is recommended. Next download cuDNN 7.1.4. Now we need to unzip the file. It is also required to change to the cuDNN root directory. Also move the header and libraries to local CUDA Toolkit folder as follows − tar xvzf cudnn-9.2-linux-x64-v7.1 sudo cp -P cuda/include/cudnn.h /usr/local/cuda/include sudo cp -P cuda/lib64/libcudnn* /usr/local/cuda/lib64 sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn* sudo ldconfig After setting up the environment for CUDA and cuDNN, follow the steps below to build the MXNet shared library from source − Step 1− First, we need to install the prerequisite packages. These dependencies are required on Ubuntu version 16.04 or later. sudo apt-get update sudo apt-get install -y build-essential git ninja-build ccache libopenblas-dev libopencv-dev cmake Step 2− In this step, we will download MXNet source and configure. First let us clone the repository by using following command− git clone –recursive https://github.com/apache/incubator-mxnet.git mxnet cd mxnet cp config/linux_gpu.cmake #for build with CUDA Step 3− By using the following commands, you can build MXNet core shared library− rm -rf build mkdir -p build && cd build cmake -GNinja .. cmake –build . Two important points regarding the above step is as follows− If you want to build the Debug version, then specify the as follows− cmake -DCMAKE_BUILD_TYPE=Debug -GNinja .. In order to set the number of parallel compilation jobs, specify the following − cmake –build . –parallel N Once you successfully build MXNet core shared library, in the build folder in your MXNet project root, you will find libmxnet.so which is required to install language bindings(optional). Central Processing Unit (CPU) Here, we will use various methods namely Pip, Docker, and Source to install MXNet when we are using CPU for processing − By using Pip method You can use the following command to install MXNet on your Linus OS− pip install mxnet Apache MXNet also offers MKL-DNN enabled pip packages which are much faster, when running on intel hardware. pip install mxnet-mkl By using Docker You can find the docker images with MXNet at DockerHub, which is available at . Let us check out the steps below to install MXNet by using Docker with CPU − Step 1− First, by following the docker installation instructions which are available at . We need to install Docker on our machine. Step 2− By using the following command, you can pull the MXNet docker image: $ sudo docker pull mxnet/python Now, in order to see if mxnet/python docker image pull was successful, we can list docker images as follows − $ sudo docker images For the fastest inference speeds with MXNet, it is recommended to use the latest MXNet with Intel MKL-DNN. Check the commands below − $ sudo docker pull mxnet/python:1.3.0_cpu_mkl $ sudo docker images From source To build the MXNet shared library from source with CPU, follow the steps below − Step 1− First, we need to install the prerequisite packages. These dependencies are required on Ubuntu version 16.04 or later. sudo apt-get update sudo apt-get install -y build-essential git ninja-build ccache libopenblas-dev libopencv-dev cmake Step 2− In this step we will download MXNet source and configure. First let us clone the repository by using following command: git clone –recursive https://github.com/apache/incubator-mxnet.git mxnet cd mxnet cp config/linux.cmake config.cmake Step 3− By using the following commands, you can build MXNet core shared library: rm -rf build mkdir -p build && cd build cmake -GNinja .. cmake –build . Two important points regarding the above step is as follows− If you want to build the Debug version, then specify the as follows: cmake -DCMAKE_BUILD_TYPE=Debug -GNinja .. In order to set the number of parallel compilation jobs, specify the following− cmake –build . –parallel N Once you successfully build MXNet core shared library, in the build folder in your MXNet project root, you will find libmxnet.so, which is required to install language bindings(optional). MacOS We can install MXNet on MacOS in the following ways− Graphical Processing Unit (GPU) If you plan to build MXNet on MacOS with GPU,
Apache MXNet – Gluon Another most important MXNet Python package is Gluon. In this chapter, we will be discussing this package. Gluon provides a clear, concise, and simple API for DL projects. It enables Apache MXNet to prototype, build, and train DL models without forfeiting the training speed. Blocks Blocks form the basis of more complex network designs. In a neural network, as the complexity of neural network increases, we need to move from designing single to entire layers of neurons. For example, NN design like ResNet-152 have a very fair degree of regularity by consisting of blocks of repeated layers. Example In the example given below, we will write code a simple block, namely block for a multilayer perceptron. from mxnet import nd from mxnet.gluon import nn x = nd.random.uniform(shape=(2, 20)) N_net = nn.Sequential() N_net.add(nn.Dense(256, activation=”relu”)) N_net.add(nn.Dense(10)) N_net.initialize() N_net(x) Output This produces the following output: [[ 0.09543004 0.04614332 -0.00286655 -0.07790346 -0.05130241 0.02942038 0.08696645 -0.0190793 -0.04122177 0.05088576] [ 0.0769287 0.03099706 0.00856576 -0.044672 -0.06926838 0.09132431 0.06786592 -0.06187843 -0.03436674 0.04234696]] <NDArray 2×10 @cpu(0)> Steps needed to go from defining layers to defining blocks of one or more layers − Step 1 − Block take the data as input. Step 2 − Now, blocks will store the state in the form of parameters. For example, in the above coding example the block contains two hidden layers and we need a place to store parameters for it. Step 3 − Next block will invoke the forward function to perform forward propagation. It is also called forward computation. As a part of first forward call, blocks initialize the parameters in a lazy fashion. Step 4 − At last the blocks will invoke backward function and calculate the gradient with reference to their input. Typically, this step is performed automatically. Sequential Block A sequential block is a special kind of block in which the data flows through a sequence of blocks. In this, each block applied to the output of one before with the first block being applied on the input data itself. Let us see how sequential class works − from mxnet import nd from mxnet.gluon import nn class MySequential(nn.Block): def __init__(self, **kwargs): super(MySequential, self).__init__(**kwargs) def add(self, block): self._children[block.name] = block def forward(self, x): for block in self._children.values(): x = block(x) return x x = nd.random.uniform(shape=(2, 20)) N_net = MySequential() N_net.add(nn.Dense(256, activation =”relu”)) N_net.add(nn.Dense(10)) N_net.initialize() N_net(x) Output The output is given herewith − [[ 0.09543004 0.04614332 -0.00286655 -0.07790346 -0.05130241 0.02942038 0.08696645 -0.0190793 -0.04122177 0.05088576] [ 0.0769287 0.03099706 0.00856576 -0.044672 -0.06926838 0.09132431 0.06786592 -0.06187843 -0.03436674 0.04234696]] <NDArray 2×10 @cpu(0)> Custom Block We can easily go beyond concatenation with sequential block as defined above. But, if we would like to make customisations then the Block class also provides us the required functionality. Block class has a model constructor provided in nn module. We can inherit that model constructor to define the model we want. In the following example, the MLP class overrides the __init__ and forward functions of the Block class. Let us see how it works. class MLP(nn.Block): def __init__(self, **kwargs): super(MLP, self).__init__(**kwargs) self.hidden = nn.Dense(256, activation=”relu”) # Hidden layer self.output = nn.Dense(10) # Output layer def forward(self, x): hidden_out = self.hidden(x) return self.output(hidden_out) x = nd.random.uniform(shape=(2, 20)) N_net = MLP() N_net.initialize() N_net(x) Output When you run the code, you will see the following output: [[ 0.07787763 0.00216403 0.01682201 0.03059879 -0.00702019 0.01668715 0.04822846 0.0039432 -0.09300035 -0.04494302] [ 0.08891078 -0.00625484 -0.01619131 0.0380718 -0.01451489 0.02006172 0.0303478 0.02463485 -0.07605448 -0.04389168]] <NDArray 2×10 @cpu(0)> Custom Layers Apache MXNet’s Gluon API comes with a modest number of pre-defined layers. But still at some point, we may find that a new layer is needed. We can easily add a new layer in Gluon API. In this section, we will see how we can create a new layer from scratch. The Simplest Custom Layer To create a new layer in Gluon API, we must have to create a class inherits from the Block class which provides the most basic functionality. We can inherit all the pre-defined layers from it directly or via other subclasses. For creating the new layer, the only instance method needed to be implemented is forward (self, x). This method defines, what exactly our layer is going to do during forward propagation. As discussed earlier also, the back-propagation pass for blocks will be done by Apache MXNet itself automatically. Example In the example below, we will be defining a new layer. We will also implement forward() method to normalise the input data by fitting it into a range of [0, 1]. from __future__ import print_function import mxnet as mx from mxnet import nd, gluon, autograd from mxnet.gluon.nn import Dense mx.random.seed(1) class NormalizationLayer(gluon.Block): def __init__(self): super(NormalizationLayer, self).__init__() def forward(self, x): return (x – nd.min(x)) / (nd.max(x) – nd.min(x)) x = nd.random.uniform(shape=(2, 20)) N_net = NormalizationLayer() N_net.initialize() N_net(x) Output On executing the above program, you will get the following result − [[0.5216355 0.03835821 0.02284337 0.5945146 0.17334817 0.69329053 0.7782702 1. 0.5508242 0. 0.07058554 0.3677264 0.4366546 0.44362497 0.7192635 0.37616986 0.6728799 0.7032008 0.46907538 0.63514024] [0.9157533 0.7667402 0.08980197 0.03593295 0.16176797 0.27679572 0.07331014 0.3905285 0.6513384 0.02713427 0.05523694 0.12147208 0.45582628 0.8139887 0.91629887 0.36665893 0.07873632 0.78268915 0.63404864 0.46638715]] <NDArray 2×20 @cpu(0)> Hybridisation It may be defined as a process used by Apache MXNet’s to create a symbolic graph of a forward computation. Hybridisation allows MXNet to upsurge the computation performance by optimising the computational symbolic graph. Rather than directly inheriting from Block, in fact, we may find that while implementing existing layers a block inherits from a HybridBlock. Following are the reasons for this − Allows us to write custom layers: HybridBlock allows us to write custom layers that can further be used in imperative and symbolic programming both. Increase computation performance− HybridBlock optimise the computational symbolic graph which allows MXNet to increase computation performance. Example In this example, we will be rewriting our example layer, created above, by using HybridBlock: class NormalizationHybridLayer(gluon.HybridBlock): def __init__(self): super(NormalizationHybridLayer, self).__init__() def hybrid_forward(self, F, x): return F.broadcast_div(F.broadcast_sub(x, F.min(x)), (F.broadcast_sub(F.max(x), F.min(x)))) layer_hybd = NormalizationHybridLayer() layer_hybd(nd.array([1, 2, 3, 4, 5, 6], ctx=mx.cpu())) Output The
Python API Autograd and Initializer This chapter deals with the autograd and initializer API in MXNet. mxnet.autograd This is MXNet’ autograd API for NDArray. It has the following class − Class: Function() It is used for customised differentiation in autograd. It can be written as mxnet.autograd.Function. If, for any reason, the user do not want to use the gradients that are computed by the default chain-rule, then he/she can use Function class of mxnet.autograd to customize differentiation for computation. It has two methods namely Forward() and Backward(). Let us understand the working of this class with the help of following points − First, we need to define our computation in the forward method. Then, we need to provide the customized differentiation in the backward method. Now during gradient computation, instead of user-defined backward function, mxnet.autograd will use the backward function defined by the user. We can also cast to numpy array and back for some operations in forward as well as backward. Example Before using the mxnet.autograd.function class, let’s define a stable sigmoid function with backward as well as forward methods as follows − class sigmoid(mx.autograd.Function): def forward(self, x): y = 1 / (1 + mx.nd.exp(-x)) self.save_for_backward(y) return y def backward(self, dy): y, = self.saved_tensors return dy * y * (1-y) Now, the function class can be used as follows − func = sigmoid() x = mx.nd.random.uniform(shape=(10,)) x.attach_grad() with mx.autograd.record(): m = func(x) m.backward() dx_grad = x.grad.asnumpy() dx_grad Output When you run the code, you will see the following output − array([0.21458015, 0.21291625, 0.23330082, 0.2361367 , 0.23086983, 0.24060014, 0.20326573, 0.21093895, 0.24968489, 0.24301809], dtype=float32) Methods and their parameters Following are the methods and their parameters of mxnet.autogard.function class − Methods and its Parameters Definition forward (heads[, head_grads, retain_graph, …]) This method is used for forward computation. backward(heads[, head_grads, retain_graph, …]) This method is used for backward computation. It computes the gradients of heads with respect to previously marked variables. This method takes as many inputs as forward’s output. It also returns as many NDArray’s as forward’s inputs. get_symbol(x) This method is used to retrieve recorded computation history as Symbol. grad(heads, variables[, head_grads, …]) This method computes the gradients of heads with respect to variables. Once computed, instead of storing into variable.grad, gradients will be returned as new NDArrays. is_recording() With the help of this method we can get status on recording and not recording. is_training() With the help of this method we can get status on training and predicting. mark_variables(variables, gradients[, grad_reqs]) This method will mark NDArrays as variables to compute gradient for autograd. This method is same as function .attach_grad() in a variable but the only difference is that with this call we can set the gradient to any value. pause([train_mode]) This method returns a scope context to be used in ‘with’ statement for codes which do not need gradients to be calculated. predict_mode() This method returns a scope context to be used in ‘with’ statement in which forward pass behavior is set to inference mode and that is without changing the recording states. record([train_mode]) It will return an autograd recording scope context to be used in ‘with’ statement and captures code which needs gradients to be calculated. set_recording(is_recording) Similar to is_recoring(), with the help of this method we can get status on recording and not recording. set_training(is_training) Similar to is_traininig(), with the help of this method we can set status to training or predicting. train_mode() This method will return a scope context to be used in ‘with’ statement in which forward pass behavior is set to training mode and that is without changing the recording states. Implementation Example In the below example, we will be using mxnet.autograd.grad() method to compute the gradient of head with respect to variables − x = mx.nd.ones((2,)) x.attach_grad() with mx.autograd.record(): z = mx.nd.elemwise_add(mx.nd.exp(x), x) dx_grad = mx.autograd.grad(z, [x], create_graph=True) dx_grad Output The output is mentioned below − [ [3.7182817 3.7182817] <NDArray 2 @cpu(0)>] We can use mxnet.autograd.predict_mode() method to return a scope to be used in ‘with’ statement − with mx.autograd.record(): y = model(x) with mx.autograd.predict_mode(): y = sampling(y) backward([y]) mxnet.intializer This is MXNet’ API for weigh initializer. It has the following classes − Classes and their parameters Following are the methods and their parameters of mxnet.autogard.function class: Classes and its Parameters Definition Bilinear() With the help of this class we can initialize weight for up-sampling layers. Constant(value) This class initializes the weights to a given value. The value can be a scalar as well as NDArray that matches the shape of the parameter to be set. FusedRNN(init, num_hidden, num_layers, mode) As name implies, this class initialize parameters for the fused Recurrent Neural Network (RNN) layers. InitDesc It acts as the descriptor for the initialization pattern. Initializer(**kwargs) This is the base class of an initializer. LSTMBias([forget_bias]) This class initialize all biases of an LSTMCell to 0.0 but except for the forget gate whose bias is set to a custom value. Load(param[, default_init, verbose]) This class initialize the variables by loading data from file or dictionary. MSRAPrelu([factor_type, slope]) As name implies, this class Initialize the weight according to a MSRA paper. Mixed(patterns, initializers) It initializes the parameters using multiple initializers. Normal([sigma]) Normal() class initializes weights with random values sampled from a normal distribution with a mean of zero and standard deviation (SD) of sigma. One() It initializes the weights of parameter to one. Orthogonal([scale, rand_type]) As name implies, this class initialize weight as orthogonal matrix. Uniform([scale]) It initializes weights with random values which is uniformly sampled from a given range. Xavier([rnd_type, factor_type, magnitude]) It actually returns an initializer that performs “Xavier” initialization for weights. Zero() It initializes the weights of parameter to zero. Implementation Example In the below example, we will be using mxnet.init.Normal() class create an initializer and retrieve its parameters − init = mx.init.Normal(0.8) init.dumps() Output The output is given below − ”[“normal”, {“sigma”: 0.8}]” Example init = mx.init.Xavier(factor_type=”in”, magnitude=2.45) init.dumps() Output The output is shown below − ”[“xavier”, {“rnd_type”: “uniform”, “factor_type”: “in”, “magnitude”: 2.45}]” In the below example, we will be
Apache MXNet – NDArray In this chapter, we will be discussing about MXNet’s multi-dimensional array format called ndarray. Handling data with NDArray First, we are going see how we can handle data with NDArray. Following are the prerequisites for the same − Prerequisites To understand how we can handle data with this multi-dimensional array format, we need to fulfil the following prerequisites: MXNet installed in a Python environment Python 2.7.x or Python 3.x Implementation Example Let us understand the basic functionality with the help of an example given below − First, we need to import MXNet and ndarray from MXNet as follows − import mxnet as mx from mxnet import nd Once we import the necessary libraries, we will go with the following basic functionalities: A simple 1-D array with a python list Example x = nd.array([1,2,3,4,5,6,7,8,9,10]) print(x) Output The output is as mentioned below − [ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.] <NDArray 10 @cpu(0)> A 2-D array with a python list Example y = nd.array([[1,2,3,4,5,6,7,8,9,10], [1,2,3,4,5,6,7,8,9,10], [1,2,3,4,5,6,7,8,9,10]]) print(y) Output The output is as stated below − [[ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.] [ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.] [ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.]] <NDArray 3×10 @cpu(0)> Creating an NDArray without any initialisation Here, we will create a matrix with 3 rows and 4 columns by using .empty function. We will also use .full function, which will take an additional operator for what value you want to fill in the array. Example x = nd.empty((3, 4)) print(x) x = nd.full((3,4), 8) print(x) Output The output is given below − [[0.000e+00 0.000e+00 0.000e+00 0.000e+00] [0.000e+00 0.000e+00 2.887e-42 0.000e+00] [0.000e+00 0.000e+00 0.000e+00 0.000e+00]] <NDArray 3×4 @cpu(0)> [[8. 8. 8. 8.] [8. 8. 8. 8.] [8. 8. 8. 8.]] <NDArray 3×4 @cpu(0)> Matrix of all zeros with the .zeros function Example x = nd.zeros((3, 8)) print(x) Output The output is as follows − [[0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0.]] <NDArray 3×8 @cpu(0)> Matrix of all ones with the .ones function Example x = nd.ones((3, 8)) print(x) Output The output is mentioned below − [[1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1.]] <NDArray 3×8 @cpu(0)> Creating array whose values are sampled randomly Example y = nd.random_normal(0, 1, shape=(3, 4)) print(y) Output The output is given below − [[ 1.2673576 -2.0345826 -0.32537818 -1.4583491 ] [-0.11176403 1.3606371 -0.7889914 -0.17639421] [-0.2532185 -0.42614475 -0.12548696 1.4022992 ]] <NDArray 3×4 @cpu(0)> Finding dimension of each NDArray Example y.shape Output The output is as follows − (3, 4) Finding the size of each NDArray Example y.size Output 12 Finding the datatype of each NDArray Example y.dtype Output numpy.float32 NDArray Operations In this section, we will introduce you to MXNet’s array operations. NDArray support large number of standard mathematical as well as In-place operations. Standard Mathematical Operations Following are standard mathematical operations supported by NDArray − Element-wise addition First, we need to import MXNet and ndarray from MXNet as follows: import mxnet as mx from mxnet import nd x = nd.ones((3, 5)) y = nd.random_normal(0, 1, shape=(3, 5)) print(”x=”, x) print(”y=”, y) x = x + y print(”x = x + y, x=”, x) Output The output is given herewith − x= [[1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.]] <NDArray 3×5 @cpu(0)> y= [[-1.0554522 -1.3118273 -0.14674698 0.641493 -0.73820823] [ 2.031364 0.5932667 0.10228804 1.179526 -0.5444829 ] [-0.34249446 1.1086396 1.2756858 -1.8332436 -0.5289873 ]] <NDArray 3×5 @cpu(0)> x = x + y, x= [[-0.05545223 -0.3118273 0.853253 1.6414931 0.26179177] [ 3.031364 1.5932667 1.102288 2.1795259 0.4555171 ] [ 0.6575055 2.1086397 2.2756858 -0.8332436 0.4710127 ]] <NDArray 3×5 @cpu(0)> Element-wise multiplication Example x = nd.array([1, 2, 3, 4]) y = nd.array([2, 2, 2, 1]) x * y Output You will see the following output− [2. 4. 6. 4.] <NDArray 4 @cpu(0)> Exponentiation Example nd.exp(x) Output When you run the code, you will see the following output: [ 2.7182817 7.389056 20.085537 54.59815 ] <NDArray 4 @cpu(0)> Matrix transpose to compute matrix-matrix product Example nd.dot(x, y.T) Output Given below is the output of the code − [16.] <NDArray 1 @cpu(0)> In-place Operations Every time, in the above example, we ran an operation, we allocated a new memory to host its result. For example, if we write A = A+B, we will dereference the matrix that A used to point to and instead point it at the newly allocated memory. Let us understand it with the example given below, using Python’s id() function − print(”y=”, y) print(”id(y):”, id(y)) y = y + x print(”after y=y+x, y=”, y) print(”id(y):”, id(y)) Output Upon execution, you will receive the following output − y= [2. 2. 2. 1.] <NDArray 4 @cpu(0)> id(y): 2438905634376 after y=y+x, y= [3. 4. 5. 5.] <NDArray 4 @cpu(0)> id(y): 2438905685664 In fact, we can also assign the result to a previously allocated array as follows − print(”x=”, x) z = nd.zeros_like(x) print(”z is zeros_like x, z=”, z) print(”id(z):”, id(z)) print(”y=”, y) z[:] = x + y print(”z[:] = x + y, z=”, z) print(”id(z) is the same as before:”, id(z)) Output The output is shown below − x= [1. 2. 3. 4.] <NDArray 4 @cpu(0)> z is zeros_like x, z= [0. 0. 0. 0.] <NDArray 4 @cpu(0)> id(z): 2438905790760 y= [3. 4. 5. 5.] <NDArray 4 @cpu(0)> z[:] = x + y, z= [4. 6. 8. 9.] <NDArray 4 @cpu(0)> id(z) is the same as before: 2438905790760 From the above output, we can see that x+y will still allocate a temporary buffer to store the result before copying it to z. So now, we can perform operations in-place to make better use of memory and to avoid temporary buffer. To do this, we will specify the out keyword argument every operator support as follows − print(”x=”, x,
Apache MXNet – System Components Here, the system components in Apache MXNet are explained in detail. First, we will study about the execution engine in MXNet. Execution Engine Apache MXNet’s execution engine is very versatile. We can use it for deep learning as well as any domain-specific problem: execute a bunch of functions following their dependencies. It is designed in such a way that the functions with dependencies are serialized whereas, the functions with no dependencies can be executed in parallel. Core Interface The API given below is the core interface for Apache MXNet’s execution engine − virtual void PushSync(Fn exec_fun, Context exec_ctx, std::vector<VarHandle> const& const_vars, std::vector<VarHandle> const& mutate_vars) = 0; The above API has the following − exec_fun − The core interface API of MXNet allows us to push the function named exec_fun, along with its context information and dependencies, to the execution engine. exec_ctx − The context information in which the above-mentioned function exec_fun should be executed. const_vars − These are the variables that the function reads from. mutate_vars − These are the variables that are to be modified. The execution engine provides its user the guarantee that the execution of any two functions that modify a common variable is serialized in their push order. Function Following is the function type of the execution engine of Apache MXNet − using Fn = std::function<void(RunContext)>; In the above function, RunContext contains the runtime information. The runtime information should be determined by the execution engine. The syntax of RunContext is as follows− struct RunContext { // stream pointer which could be safely cast to // cudaStream_t* type void *stream; }; Below are given some important points about execution engine’s functions − All the functions are executed by MXNet’s execution engine’s internal threads. It is not good to push blocking the function to the execution engine because with that the function will occupy the execution thread and will also reduce the total throughput. For this MXNet provides another asynchronous function as follows− using Callback = std::function<void()>; using AsyncFn = std::function<void(RunContext, Callback)>; In this AsyncFn function we can pass the heavy part of our threads, but the execution engine does not consider the function finished until we call the callback function. Context In Context, we can specify the context of the function to be executed within. This usually includes the following − Whether the function should be run on a CPU or a GPU. If we specify GPU in the Context, then which GPU to use. There is a huge difference between Context and RunContext. Context have the device type and device id, whereas RunContext have the information that can be decided only during runtime. VarHandle VarHandle, used to specify the dependencies of functions, is like a token (especially provided by execution engine) we can use to represents the external resources the function can modify or use. But the question arises, why we need to use VarHandle? It is because, the Apache MXNet engine is designed to decoupled from other MXNet modules. Following are some important points about VarHandle − It is lightweight so to create, delete, or copying a variable incurs little operating cost. We need to specify the immutable variables i.e. the variables that will be used in the const_vars. We need to specify the mutable variables i.e. the variables that will be modified in the mutate_vars. The rule used by the execution engine to resolve the dependencies among functions is that the execution of any two functions when one of them modifies at least one common variable is serialized in their push order. For creating a new variable, we can use the NewVar() API. For deleting a variable, we can use the PushDelete API. Let us understand its working with a simple example − Suppose if we have two functions namely F1 and F2 and they both mutate the variable namely V2. In that case, F2 is guaranteed to be executed after F1 if F2 is pushed after F1. On the other side, if F1 and F2 both use V2 then their actual execution order could be random. Push and Wait Push and wait are two more useful API of execution engine. Following are two important features of Push API: All the Push APIs are asynchronous which means that the API call immediately returns regardless of whether the pushed function is finished or not. Push API is not thread safe which means that only one thread should make engine API calls at a time. Now if we talk about Wait API, following points represent it − If a user wants to wait for a specific function to be finished, he/she should include a callback function in the closure. Once included, call the function at the end of the function. On the other hand, if a user wants to wait for all functions that involves a certain variable to finish, he/she should use WaitForVar(var) API. If someone wants to wait for all the pushed functions to finish, then use the WaitForAll () API. Used to specify the dependencies of functions, is like a token. Operators Operator in Apache MXNet is a class that contains actual computation logic as well as auxiliary information and aid the system in performing optimisation. Operator Interface Forward is the core operator interface whose syntax is as follows: virtual void Forward(const OpContext &ctx, const std::vector<TBlob> &in_data, const std::vector<OpReqType> &req, const std::vector<TBlob> &out_data, const std::vector<TBlob> &aux_states) = 0; The structure of OpContext, defined in Forward() is as follows: struct OpContext { int is_train; RunContext run_ctx; std::vector<Resource> requested; } The OpContext describes the state of operator (whether in the train or test phase), which device the operator should be run on and also the requested resources. two more useful API of execution engine. From the above Forward core interface, we can understand the requested resources as follows − in_data and out_data represent the input and output tensors. req denotes how the result of computation are written into the out_data. The OpReqType can be defined as − enum OpReqType