Big Data & Analytics Archives - Page 6 of 75 - Donotsad where can learn any thing work project and make money

Aug 10

Tableau – Waterfall Charts

Tableau – Waterfall Charts ”; Previous Next Waterfall charts effectively display the cumulative effect of sequential positive and negative values. It shows where a value starts, ends and how it gets there incrementally. So, we are able to see both the size of changes and difference in values between consecutive data points. Tableau needs one Dimension and one Measure to create a Waterfall chart. Creating a Waterfall Chart Using the Sample-superstore, plan to find the variation of Sales for each Sub-Category of Products. To achieve this objective, following are the steps. Step 1 − Drag the Dimension Sub-Category to the Columns shelf and the Measure Sales to the Rows shelf. Sort the data in an ascending order of sales value. For this, use the sort option appearing in the middle of the vertical axis when you hover the mouse over it. The following chart appears on completing this step. Step 2 − Next, right-click on the SUM (Sales) value and select the running total from the table calculation option. Change the chart type to Gantt Bar. The following chart appears. Step 3 − Create a calculated field named -sales and mention the following formula for its value. Step 4 − Drag the newly created calculated field (-sales) to the size shelf under Marks Card. The chart above now changes to produce the following chart which is a Waterfall chart. Waterfall Chart with Color Next, give different color shades to the bars in the chart by dragging the Sales measure to the Color shelf under the Marks Card. You get the following waterfall chart with color. Print Page Previous Next Advertisements ”;

Aug 10

Zookeeper – Fundamentals

Zookeeper – Fundamentals ”; Previous Next Before going deep into the working of ZooKeeper, let us take a look at the fundamental concepts of ZooKeeper. We will discuss the following topics in this chapter − Architecture Hierarchical namespace Session Watches Architecture of ZooKeeper Take a look at the following diagram. It depicts the “Client-Server Architecture” of ZooKeeper. Each one of the components that is a part of the ZooKeeper architecture has been explained in the following table. Part Description Client Clients, one of the nodes in our distributed application cluster, access information from the server. For a particular time interval, every client sends a message to the server to let the sever know that the client is alive. Similarly, the server sends an acknowledgement when a client connects. If there is no response from the connected server, the client automatically redirects the message to another server. Server Server, one of the nodes in our ZooKeeper ensemble, provides all the services to clients. Gives acknowledgement to client to inform that the server is alive. Ensemble Group of ZooKeeper servers. The minimum number of nodes that is required to form an ensemble is 3. Leader Server node which performs automatic recovery if any of the connected node failed. Leaders are elected on service startup. Follower Server node which follows leader instruction. Hierarchical Namespace The following diagram depicts the tree structure of ZooKeeper file system used for memory representation. ZooKeeper node is referred as znode. Every znode is identified by a name and separated by a sequence of path (/). In the diagram, first you have a root znode separated by “/”. Under root, you have two logical namespaces config and workers. The config namespace is used for centralized configuration management and the workers namespace is used for naming. Under config namespace, each znode can store upto 1MB of data. This is similar to UNIX file system except that the parent znode can store data as well. The main purpose of this structure is to store synchronized data and describe the metadata of the znode. This structure is called as ZooKeeper Data Model. Every znode in the ZooKeeper data model maintains a stat structure. A stat simply provides the metadata of a znode. It consists of Version number, Action control list (ACL), Timestamp, and Data length. Version number − Every znode has a version number, which means every time the data associated with the znode changes, its corresponding version number would also increased. The use of version number is important when multiple zookeeper clients are trying to perform operations over the same znode. Action Control List (ACL) − ACL is basically an authentication mechanism for accessing the znode. It governs all the znode read and write operations. Timestamp − Timestamp represents time elapsed from znode creation and modification. It is usually represented in milliseconds. ZooKeeper identifies every change to the znodes from “Transaction ID” (zxid). Zxid is unique and maintains time for each transaction so that you can easily identify the time elapsed from one request to another request. Data length − Total amount of the data stored in a znode is the data length. You can store a maximum of 1MB of data. Types of Znodes Znodes are categorized as persistence, sequential, and ephemeral. Persistence znode − Persistence znode is alive even after the client, which created that particular znode, is disconnected. By default, all znodes are persistent unless otherwise specified. Ephemeral znode − Ephemeral znodes are active until the client is alive. When a client gets disconnected from the ZooKeeper ensemble, then the ephemeral znodes get deleted automatically. For this reason, only ephemeral znodes are not allowed to have a children further. If an ephemeral znode is deleted, then the next suitable node will fill its position. Ephemeral znodes play an important role in Leader election. Sequential znode − Sequential znodes can be either persistent or ephemeral. When a new znode is created as a sequential znode, then ZooKeeper sets the path of the znode by attaching a 10 digit sequence number to the original name. For example, if a znode with path /myapp is created as a sequential znode, ZooKeeper will change the path to /myapp0000000001 and set the next sequence number as 0000000002. If two sequential znodes are created concurrently, then ZooKeeper never uses the same number for each znode. Sequential znodes play an important role in Locking and Synchronization. Sessions Sessions are very important for the operation of ZooKeeper. Requests in a session are executed in FIFO order. Once a client connects to a server, the session will be established and a session id is assigned to the client. The client sends heartbeats at a particular time interval to keep the session valid. If the ZooKeeper ensemble does not receive heartbeats from a client for more than the period (session timeout) specified at the starting of the service, it decides that the client died. Session timeouts are usually represented in milliseconds. When a session ends for any reason, the ephemeral znodes created during that session also get deleted. Watches Watches are a simple mechanism for the client to get notifications about the changes in the ZooKeeper ensemble. Clients can set watches while reading a particular znode. Watches send a notification to the registered client for any of the znode (on which client registers) changes. Znode changes are modification of data associated with the znode or changes in the znode’s children. Watches are triggered only once. If a client wants a notification again, it must be done through another read operation. When a connection session is expired, the client will be disconnected from the server and the associated watches are also removed. Print Page Previous Next Advertisements ”;

Aug 10

Zookeeper – Leader Election

Zookeeper – Leader Election ”; Previous Next Let us analyze how a leader node can be elected in a ZooKeeper ensemble. Consider there are N number of nodes in a cluster. The process of leader election is as follows − All the nodes create a sequential, ephemeral znode with the same path, /app/leader_election/guid_. ZooKeeper ensemble will append the 10-digit sequence number to the path and the znode created will be /app/leader_election/guid_0000000001, /app/leader_election/guid_0000000002, etc. For a given instance, the node which creates the smallest number in the znode becomes the leader and all the other nodes are followers. Each follower node watches the znode having the next smallest number. For example, the node which creates znode /app/leader_election/guid_0000000008 will watch the znode /app/leader_election/guid_0000000007 and the node which creates the znode /app/leader_election/guid_0000000007 will watch the znode /app/leader_election/guid_0000000006. If the leader goes down, then its corresponding znode /app/leader_electionN gets deleted. The next in line follower node will get the notification through watcher about the leader removal. The next in line follower node will check if there are other znodes with the smallest number. If none, then it will assume the role of the leader. Otherwise, it finds the node which created the znode with the smallest number as leader. Similarly, all other follower nodes elect the node which created the znode with the smallest number as leader. Leader election is a complex process when it is done from scratch. But ZooKeeper service makes it very simple. Let us move on to the installation of ZooKeeper for development purpose in the next chapter. Print Page Previous Next Advertisements ”;

Aug 10

Statistics Notation

Statistics – Notations ”; Previous Next Following table shows the usage of various symbols used in Statistics Capitalization Generally lower case letters represent the sample attributes and capital case letters are used to represent population attributes. $ P $ – population proportion. $ p $ – sample proportion. $ X $ – set of population elements. $ x $ – set of sample elements. $ N $ – set of population size. $ N $ – set of sample size. Greek Vs Roman letters Roman letters represent the sample attributs and greek letters are used to represent Population attributes. $ mu $ – population mean. $ bar x $ – sample mean. $ delta $ – standard deviation of a population. $ s $ – standard deviation of a sample. Population specific Parameters Following symbols represent population specific attributes. $ mu $ – population mean. $ delta $ – standard deviation of a population. $ {mu}^2 $ – variance of a population. $ P $ – proportion of population elements having a particular attribute. $ Q $ – proportion of population elements having no particular attribute. $ rho $ – population correlation coefficient based on all of the elements from a population. $ N $ – number of elements in a population. Sample specific Parameters Following symbols represent population specific attributes. $ bar x $ – sample mean. $ s $ – standard deviation of a sample. $ {s}^2 $ – variance of a sample. $ p $ – proportion of sample elements having a particular attribute. $ q $ – proportion of sample elements having no particular attribute. $ r $ – population correlation coefficient based on all of the elements from a sample. $ n $ – number of elements in a sample. Linear Regression $ B_0 $ – intercept constant in a population regression line. $ B_1 $ – regression coefficient in a population regression line. $ {R}^2 $ – coefficient of determination. $ b_0 $ – intercept constant in a sample regression line. $ b_1 $ – regression coefficient in a sample regression line. $ ^{s}b_1 $ – standard error of the slope of a regression line. Probability $ P(A) $ – probability that event A will occur. $ P(A|B) $ – conditional probability that event A occurs, given that event B has occurred. $ P(A”) $ – probability of the complement of event A. $ P(A cap B) $ – probability of the intersection of events A and B. $ P(A cup B) $ – probability of the union of events A and B. $ E(X) $ – expected value of random variable X. $ b(x; n, P) $ – binomial probability. $ b*(x; n, P) $ – negative binomial probability. $ g(x; P) $ – geometric probability. $ h(x; N, n, k) $ – hypergeometric probability. Permutation/Combination $ n! $ – factorial value of n. $ ^{n}P_r $ – number of permutations of n things taken r at a time. $ ^{n}C_r $ – number of combinations of n things taken r at a time. Set $ A Cap B $ – intersection of set A and B. $ A Cup B $ – union of set A and B. $ { A, B, C } $ – set of elements consisting of A, B, and C. $ emptyset $ – null or empty set. Hypothesis Testing $ H_0 $ – null hypothesis. $ H_1 $ – alternative hypothesis. $ alpha $ – significance level. $ beta $ – probability of committing a Type II error. Random Variables $ Z $ or $ z $ – standardized score, also known as a z score. $ z_{alpha} $ – standardized score that has a cumulative probability equal to $ 1 – alpha $. $ t_{alpha} $ – t statistic that has a cumulative probability equal to $ 1 – alpha $. $ f_{alpha} $ – f statistic that has a cumulative probability equal to $ 1 – alpha $. $ f_{alpha}(v_1, v_2) $ – f statistic that has a cumulative probability equal to $ 1 – alpha $ and $ v_1 $ and $ v_2 $ degrees of freedom. $ X^2 $ – chi-square statistic. Summation Symbols $ sum $ – summation symbol, used to compute sums over a range of values. $ sum x $ or $ sum x_i $ – sum of a set of n observations. Thus, $ sum x = x_1 + x_2 + … + x_n $. Print Page Previous Next Advertisements ”;

Aug 10

Sqoop – Introduction

Sqoop – Introduction ”; Previous Next The traditional application management system, that is, the interaction of applications with relational database using RDBMS, is one of the sources that generate Big Data. Such Big Data, generated by RDBMS, is stored in Relational Database Servers in the relational database structure. When Big Data storages and analyzers such as MapReduce, Hive, HBase, Cassandra, Pig, etc. of the Hadoop ecosystem came into picture, they required a tool to interact with the relational database servers for importing and exporting the Big Data residing in them. Here, Sqoop occupies a place in the Hadoop ecosystem to provide feasible interaction between relational database server and Hadoop’s HDFS. Sqoop − “SQL to Hadoop and Hadoop to SQL” Sqoop is a tool designed to transfer data between Hadoop and relational database servers. It is used to import data from relational databases such as MySQL, Oracle to Hadoop HDFS, and export from Hadoop file system to relational databases. It is provided by the Apache Software Foundation. How Sqoop Works? The following image describes the workflow of Sqoop. Sqoop Import The import tool imports individual tables from RDBMS to HDFS. Each row in a table is treated as a record in HDFS. All records are stored as text data in text files or as binary data in Avro and Sequence files. Sqoop Export The export tool exports a set of files from HDFS back to an RDBMS. The files given as input to Sqoop contain records, which are called as rows in table. Those are read and parsed into a set of records and delimited with user-specified delimiter. Print Page Previous Next Advertisements ”;

Aug 10

Tableau – Table Calculations

Tableau – Table Calculations ”; Previous Next These are the calculations which are applied to the values in the entire table. For example, for calculating a running total or running average, we need to apply a single method of calculation to an entire column. Such calculations cannot be performed on some selected rows. Table has a feature called Quick Table Calculation, which is used to create such calculations. The steps to be applied in Quick Table calculation are as follows − Step 1 − Select the measure on which the table calculation has to be applied and drag it to column shelf. Step 2 − Right-click the measure and choose the option Quick Table Calculation. Step 3 − Choose one of the following options to be applied on the measure. Running Total Difference Percent Difference Percent of Total Rank Percentile Moving Average Year to Date (YTD) Total Compound Growth Rate Year over Year Growth Year to Date (YTD) Growth Example Let’s calculate the running total of the profits earned for the data source following the above steps. Use the data source named sample – superstore.xls. Print Page Previous Next Advertisements ”;

Aug 10

Sqoop – Sqoop Job

Sqoop – Job ”; Previous Next This chapter describes how to create and maintain the Sqoop jobs. Sqoop job creates and saves the import and export commands. It specifies parameters to identify and recall the saved job. This re-calling or re-executing is used in the incremental import, which can import the updated rows from RDBMS table to HDFS. Syntax The following is the syntax for creating a Sqoop job. $ sqoop job (generic-args) (job-args) [– [subtool-name] (subtool-args)] $ sqoop-job (generic-args) (job-args) [– [subtool-name] (subtool-args)] Create Job (–create) Here we are creating a job with the name myjob, which can import the table data from RDBMS table to HDFS. The following command is used to create a job that is importing data from the employee table in the db database to the HDFS file. $ sqoop job –create myjob — import –connect jdbc:mysql://localhost/db –username root –table employee –m 1 Verify Job (–list) ‘–list’ argument is used to verify the saved jobs. The following command is used to verify the list of saved Sqoop jobs. $ sqoop job –list It shows the list of saved jobs. Available jobs: myjob Inspect Job (–show) ‘–show’ argument is used to inspect or verify particular jobs and their details. The following command and sample output is used to verify a job called myjob. $ sqoop job –show myjob It shows the tools and their options, which are used in myjob. Job: myjob Tool: import Options: —————————- direct.import = true codegen.input.delimiters.record = 0 hdfs.append.dir = false db.table = employee … incremental.last.value = 1206 … Execute Job (–exec) ‘–exec’ option is used to execute a saved job. The following command is used to execute a saved job called myjob. $ sqoop job –exec myjob It shows you the following output. 10/08/19 13:08:45 INFO tool.CodeGenTool: Beginning code generation … Print Page Previous Next Advertisements ”;

Aug 10

Tableau – Useful Resources

Tableau – Useful Resources ”; Previous Next The following resources contain additional information on Tableau. Please use them to get more in-depth knowledge on this topic. Useful Video Courses Tableau Online Training Course Most Popular 89 Lectures 10 hours Tutorialspoint More Detail Tableau Hands-On Course: Data Visualization With Tableau Most Popular 46 Lectures 5.5 hours TELCOMA Global More Detail Tableau A-Z: Hands-On Tableau Training Featured 32 Lectures 6.5 hours Satyajit Pattnaik More Detail Tableau Data Visualization: Step by Step Guide 64 Lectures 7 hours Techquest Hub More Detail Tableau Basics for Beginners 39 Lectures 5 hours Rushabh Jain More Detail Tableau 101 17 Lectures 47 mins Esha Prakash More Detail Print Page Previous Next Advertisements ”;

Aug 10

Tableau – Bullet Graph

Tableau – Bullet Graph ”; Previous Next A bullet chart is a variation of Bar chart. In this chart, we compare the value of one measure with another measure in the context of finding the variation in the first measure within a range of variations in the second measure. It is like two bars drawn upon one another to indicate their individual values at the same position in the graph. It can be thought of as combining two graphs as one to view a comparative result easily. Creating Bullet Graph Using the Sample-superstore, plan to find the size of profits for the respective sales figures in each Sub-Category. To achieve this objective, following are the steps. Step 1 − Drag and drop the dimension Sub-Category from the data pane into the column shelf. Step 2 − Drag and drop the measures Profit and Sales to the Rows shelf. The following chart appears which shows the two measures as two separate categories of bar charts, each representing the values for sub-categories. Step 3 − Drag the sales measure to the Marks card. Using Show Me, choose the bullet graph option. The following chart shows the bullet graph. Print Page Previous Next Advertisements ”;

Aug 10

Tableau – Gantt Chart

Tableau – Gantt Chart ”; Previous Next A Gantt chart shows the progress of the value of a task or resource over a period of time. It is extensively used in project management and other types of variation study over a period of time. Thus, in Gantt chart, time dimension is an essential field. The Gantt chart takes at least a dimension and a measure in addition to the time dimension. Creating a Gantt Chart Using the Sample-superstore, plan to find the variation of quantities of different SubCategory of products according to their ship mode over a range of time. To achieve this objective, following are the steps. Step 1 − Drag the dimension order date to the Columns shelf and Sub-Category to the Rows shelf. Next, add the order date to the Filters shelf. Right-click on order date to convert it to the exact date values as shown in the following screenshot. Step 2 − Edit the filter condition to select a range of dates. It is because you want individual date values and there is a very large number of dates in the data. The range is created as shown in the following screenshot. Step 3 − Drag the dimension ship mode to the Color shelf and the measure quantity to the Size shelf under the Marks card. This produces the Gantt chart as shown in the following screenshot. Print Page Previous Next Advertisements ”;