Tableau – Scatter Plot ”; Previous Next As the name suggests, a scatter plot shows many points scattered in the Cartesian plane. It is created by plotting values of numerical variables as X and Y coordinates in the Cartesian plane. Tableau takes at least one measure in the Rows shelf and one measure in the Columns shelf to create a scatter plot. However, we can add dimension fields to the scatter plot which play a role in marking different colors for the already existing points in the scatter graph. Simple Scatter Plot Using the Sample-superstore, let”s aim to find the variation of sales and profit figures as the two axes of the Cartesian plane is distributed according to their Sub-Category. To achieve this objective, following are the steps. Step 1 − Drag and drop the measure Sales to the Columns shelf. Step 2 − Drag and drop the measure Profit to the Rows shelf. Step 3 − Pull the dimension Sub-Category to the labels Shelf under Marks. The following chart appears which shows how profit and sales is distributed across the Sub-Category of products. Scatter Plot – Color Encoded You can get the values color encoded by dragging the dimension Sub-Category to the color Shelf under the Marks card. This chart shows the scatter points with different color for each point. Drill-Down Scatter Plot The same scatter plot can show different values when you choose a dimension with hierarchy. In the following example, we expand the Sub-Category field to show the scatter plot values for the Manufacturers. Print Page Previous Next Advertisements ”;
Category: Big Data & Analytics
Zookeeper – CLI
Zookeeper – CLI ”; Previous Next ZooKeeper Command Line Interface (CLI) is used to interact with the ZooKeeper ensemble for development purpose. It is useful for debugging and working around with different options. To perform ZooKeeper CLI operations, first turn on your ZooKeeper server (“bin/zkServer.sh start”) and then, ZooKeeper client (“bin/zkCli.sh”). Once the client starts, you can perform the following operation − Create znodes Get data Watch znode for changes Set data Create children of a znode List children of a znode Check Status Remove / Delete a znode Now let us see above command one by one with an example. Create Znodes Create a znode with the given path. The flag argument specifies whether the created znode will be ephemeral, persistent, or sequential. By default, all znodes are persistent. Ephemeral znodes (flag: e) will be automatically deleted when a session expires or when the client disconnects. Sequential znodes guaranty that the znode path will be unique. ZooKeeper ensemble will add sequence number along with 10 digit padding to the znode path. For example, the znode path /myapp will be converted to /myapp0000000001 and the next sequence number will be /myapp0000000002. If no flags are specified, then the znode is considered as persistent. Syntax create /path /data Sample create /FirstZnode “Myfirstzookeeper-app” Output [zk: localhost:2181(CONNECTED) 0] create /FirstZnode “Myfirstzookeeper-app” Created /FirstZnode To create a Sequential znode, add -s flag as shown below. Syntax create -s /path /data Sample create -s /FirstZnode second-data Output [zk: localhost:2181(CONNECTED) 2] create -s /FirstZnode “second-data” Created /FirstZnode0000000023 To create an Ephemeral Znode, add -e flag as shown below. Syntax create -e /path /data Sample create -e /SecondZnode “Ephemeral-data” Output [zk: localhost:2181(CONNECTED) 2] create -e /SecondZnode “Ephemeral-data” Created /SecondZnode Remember when a client connection is lost, the ephemeral znode will be deleted. You can try it by quitting the ZooKeeper CLI and then re-opening the CLI. Get Data It returns the associated data of the znode and metadata of the specified znode. You will get information such as when the data was last modified, where it was modified, and information about the data. This CLI is also used to assign watches to show notification about the data. Syntax get /path Sample get /FirstZnode Output [zk: localhost:2181(CONNECTED) 1] get /FirstZnode “Myfirstzookeeper-app” cZxid = 0x7f ctime = Tue Sep 29 16:15:47 IST 2015 mZxid = 0x7f mtime = Tue Sep 29 16:15:47 IST 2015 pZxid = 0x7f cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 22 numChildren = 0 To access a sequential znode, you must enter the full path of the znode. Sample get /FirstZnode0000000023 Output [zk: localhost:2181(CONNECTED) 1] get /FirstZnode0000000023 “Second-data” cZxid = 0x80 ctime = Tue Sep 29 16:25:47 IST 2015 mZxid = 0x80 mtime = Tue Sep 29 16:25:47 IST 2015 pZxid = 0x80 cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 13 numChildren = 0 Watch Watches show a notification when the specified znode or znode’s children data changes. You can set a watch only in get command. Syntax get /path [watch] 1 Sample get /FirstZnode 1 Output [zk: localhost:2181(CONNECTED) 1] get /FirstZnode 1 “Myfirstzookeeper-app” cZxid = 0x7f ctime = Tue Sep 29 16:15:47 IST 2015 mZxid = 0x7f mtime = Tue Sep 29 16:15:47 IST 2015 pZxid = 0x7f cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 22 numChildren = 0 The output is similar to normal get command, but it will wait for znode changes in the background. <Start here> Set Data Set the data of the specified znode. Once you finish this set operation, you can check the data using the get CLI command. Syntax set /path /data Sample set /SecondZnode Data-updated Output [zk: localhost:2181(CONNECTED) 1] get /SecondZnode “Data-updated” cZxid = 0x82 ctime = Tue Sep 29 16:29:50 IST 2015 mZxid = 0x83 mtime = Tue Sep 29 16:29:50 IST 2015 pZxid = 0x82 cversion = 0 dataVersion = 1 aclVersion = 0 ephemeralOwner = 0x15018b47db00000 dataLength = 14 numChildren = 0 If you assigned watch option in get command (as in previous command), then the output will be similar as shown below − Output [zk: localhost:2181(CONNECTED) 1] get /FirstZnode “Mysecondzookeeper-app” WATCHER: : WatchedEvent state:SyncConnected type:NodeDataChanged path:/FirstZnode cZxid = 0x7f ctime = Tue Sep 29 16:15:47 IST 2015 mZxid = 0x84 mtime = Tue Sep 29 17:14:47 IST 2015 pZxid = 0x7f cversion = 0 dataVersion = 1 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 23 numChildren = 0 Create Children / Sub-znode Creating children is similar to creating new znodes. The only difference is that the path of the child znode will have the parent path as well. Syntax create /parent/path/subnode/path /data Sample create /FirstZnode/Child1 firstchildren Output [zk: localhost:2181(CONNECTED) 16] create /FirstZnode/Child1 “firstchildren” created /FirstZnode/Child1 [zk: localhost:2181(CONNECTED) 17] create /FirstZnode/Child2 “secondchildren” created /FirstZnode/Child2 List Children This command is used to list and display the children of a znode. Syntax ls /path Sample ls /MyFirstZnode Output [zk: localhost:2181(CONNECTED) 2] ls /MyFirstZnode [mysecondsubnode, myfirstsubnode] Check Status Status describes the metadata of a specified znode. It contains details such as Timestamp, Version number, ACL, Data length, and Children znode. Syntax stat /path Sample stat /FirstZnode Output [zk: localhost:2181(CONNECTED) 1] stat /FirstZnode cZxid = 0x7f ctime = Tue Sep 29 16:15:47 IST 2015 mZxid = 0x7f mtime = Tue Sep 29 17:14:24 IST 2015 pZxid = 0x7f cversion = 0 dataVersion = 1 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 23 numChildren = 0 Remove a Znode Removes a specified znode and recursively all its children. This would happen only if such a znode is available. Syntax rmr /path Sample rmr /FirstZnode Output [zk: localhost:2181(CONNECTED) 10] rmr /FirstZnode [zk: localhost:2181(CONNECTED) 11] get /FirstZnode Node does not exist: /FirstZnode Delete (delete /path) command is similar to remove command, except the fact that it works only on znodes with no children. Print Page Previous Next Advertisements
Tableau – Quick Filters
Tableau – Quick Filters ”; Previous Next Many filter types in Tableau are quickly available using the right-click option on the dimension or measure. These filters known as Quick filters have enough functionality to solve most of the common filtering needs. The following screenshot shows how the quick filters are accessed. Following is a list of various quick filters and their use. Filter name Purpose Single Value (List) Select one value at a time in a list. Single Value (Dropdown) Select a single value in a drop-down list. Multiple Values (List) Select one or more values in a list. Multiple Values (Dropdown) Select one or more values in a drop-down list. Multiple Values (Custom List) Search and select one or more values. Single Value (Slider) Drag a horizontal slider to select a single value. Wildcard Match Select values containing the specified characters. Example Consider the Sample-Superstore data source to apply some quick filters. In the following example, choose sub-category as the row and sales as the column which by default produces a horizontal bar chart. Next, drag the sub-category field to the filters pane. All the subcategories appear next to the chart. Apply wildcard filtering using the expression a* which selects all subcategory name starting with “a”. The below screen shows the result of applying this filter where only the sub-categories starting with “A” are displayed. Clearing the Filter Once the analysis is complete by applying the filter, remove it by using the clear filter option. For this, go to the filter Pane, right-click on the field name and choose Clear Filter as shown in the following screenshot. Print Page Previous Next Advertisements ”;
Zookeeper – Applications
Zookeeper – Applications ”; Previous Next Zookeeper provides a flexible coordination infrastructure for distributed environment. ZooKeeper framework supports many of the today”s best industrial applications. We will discuss some of the most notable applications of ZooKeeper in this chapter. Yahoo! The ZooKeeper framework was originally built at “Yahoo!”. A well-designed distributed application needs to meet requirements such as data transparency, better performance, robustness, centralized configuration, and coordination. So, they designed the ZooKeeper framework to meet these requirements. Apache Hadoop Apache Hadoop is the driving force behind the growth of Big Data industry. Hadoop relies on ZooKeeper for configuration management and coordination. Let us take a scenario to understand the role of ZooKeeper in Hadoop. Assume that a Hadoop cluster bridges 100 or more commodity servers. Therefore, there’s a need for coordination and naming services. As computation of large number of nodes are involved, each node needs to synchronize with each other, know where to access services, and know how they should be configured. At this point of time, Hadoop clusters require cross-node services. ZooKeeper provides the facilities for cross-node synchronization and ensures the tasks across Hadoop projects are serialized and synchronized. Multiple ZooKeeper servers support large Hadoop clusters. Each client machine communicates with one of the ZooKeeper servers to retrieve and update its synchronization information. Some of the real-time examples are − Human Genome Project − The Human Genome Project contains terabytes of data. Hadoop MapReduce framework can be used to analyze the dataset and find interesting facts for human development. Healthcare − Hospitals can store, retrieve, and analyze huge sets of patient medical records, which are normally in terabytes. Apache HBase Apache HBase is an open source, distributed, NoSQL database used for real-time read/write access of large datasets and runs on top of the HDFS. HBase follows master-slave architecture where the HBase Master governs all the slaves. Slaves are referred as Region servers. HBase distributed application installation depends on a running ZooKeeper cluster. Apache HBase uses ZooKeeper to track the status of distributed data throughout the master and region servers with the help of centralized configuration management and distributed mutex mechanisms. Here are some of the use-cases of HBase − Telecom − Telecom industry stores billions of mobile call records (around 30TB / month) and accessing these call records in real time become a huge task. HBase can be used to process all the records in real time, easily and efficiently. Social network − Similar to telecom industry, sites like Twitter, LinkedIn, and Facebook receive huge volumes of data through the posts created by users. HBase can be used to find recent trends and other interesting facts. Apache Solr Apache Solr is a fast, open source search platform written in Java. It is a blazing fast, faulttolerant distributed search engine. Built on top of Lucene, it is a high-performance, full-featured text search engine. Solr extensively uses every feature of ZooKeeper such as Configuration management, Leader election, node management, Locking and syncronization of data. Solr has two distinct parts, indexing and searching. Indexing is a process of storing the data in a proper format so that it can be searched later. Solr uses ZooKeeper for both indexing the data in multiple nodes and searching from multiple nodes. ZooKeeper contributes the following features − Add / remove nodes as and when needed Replication of data between nodes and subsequently minimizing data loss Sharing of data between multiple nodes and subsequently searching from multiple nodes for faster search results Some of the use-cases of Apache Solr include e-commerce, job search, etc. Print Page Previous Next Advertisements ”;
Tableau – Bubble Chart
Tableau – Bubble Chart ”; Previous Next Bubble charts display data as a cluster of circles. Each of the values in the dimension field represents a circle whereas the values of measure represent the size of those circles. As the values are not going to be presented in any row or column, you can drag the required fields to different shelves under the marks card. Simple Bubble Chart Using the Sample-superstore, let”s plan to find the size of profits for different ship mode. To achieve this objective, following are the steps. Step 1 − Drag and drop the measure profit into the Size shelf under Marks card. Step 2 − Drag and drop the dimension ship mode into the Labels shelf under Marks card. Step 3 − Pull the dimension ship mode to the Colors shelf under Marks card. The following chart appears. Bubble Chart with Measure Values You can also show the values of the measure field which decides the size of the circles. To do this, drag the sales measure into the Labels shelf. The following chart appears. Bubble Chart with Measure Colors Instead of coloring each circle with a different color, you can use a single color with different shades. For this, drag the measure sales into the color shelf. The higher values represent darker shades while the smaller values represent lighter shades. Print Page Previous Next Advertisements ”;
Tableau – Crosstab
Tableau – Crosstab ”; Previous Next A crosstab chart in Tableau is also called a Text table, which shows the data in textual form. The chart is made up of one or more dimensions and one or more measures. This chart can also show various calculations on the values of the measure field such as running total, percentage total, etc. Simple Crosstab Using the Sample-superstore, let”s plan to get the amount of sales for each segment in each region. You need to display this data for each year using the order dates available. To achieve this objective, following are the steps. Step 1 − Drag and drop the dimension order date to the columns shelf. Step 2 − Drag and drop the dimensions region and segment to the rows shelf. Step 3 − Pull the measure Sales to the labels Shelf under Marks. The following chart appears which shows the Crosstab. Crosstab – Color Encoded You can get the values color encoded in the crosstab chart by dropping the measure field into the Color shelf as shown in the following screenshot. This color coding shows the strength of the color depending on the value of the measure. The larger values have a darker shade than the lighter values. Crosstab with Row Percentage In addition to the color encoding, you can also get calculations applied to the values from the measure. In the following example, we apply the calculation for finding the percentage total of sales in each row instead of only the sales figures. For this, right-click on SUM (Sales) present in the marks card and choose the option Add Table Calculation. Then, choose the percent of total and summarize it as Table (Across). On clicking OK in the screen above, you will find the crosstab chart created with percentage values as shown in the following screenshot. Print Page Previous Next Advertisements ”;
Sqoop – List Databases
Sqoop – List Databases ”; Previous Next This chapter describes how to list out the databases using Sqoop. Sqoop list-databases tool parses and executes the ‘SHOW DATABASES’ query against the database server. Thereafter, it lists out the present databases on the server. Syntax The following syntax is used for Sqoop list-databases command. $ sqoop list-databases (generic-args) (list-databases-args) $ sqoop-list-databases (generic-args) (list-databases-args) Sample Query The following command is used to list all the databases in the MySQL database server. $ sqoop list-databases –connect jdbc:mysql://localhost/ –username root If the command executes successfully, then it will display the list of databases in your MySQL database server as follows. … 13/05/31 16:45:58 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. mysql test userdb db Print Page Previous Next Advertisements ”;
Root Mean Square
Statistics – Root Mean Square ”; Previous Next Root Mean Square, RMS is defined as the square root of mean square where mean square is the arithmetic mean of the squares of numbers. RMS is also termed as the quadratic mean. Formula ${ x_{rms} = sqrt{ frac{1}{n} ( {x_1}^2 + {x_2}^2 + … + {x_n}^2 } }$ Where − ${x_i}$ = items under observation. ${n}$ = total number of items. Example Problem Statement: Compute the RMS of following data. 5 6 7 8 9 Solution: Step 1: Compute squares of each no. ${ {x_1}^2 + {x_2}^2 + … + {x_n}^2 \[7pt] = 6^2 + 7^2 + 8^2 + 9^2 \[7pt] = 36 + 49 + 64 + 81 \[7pt] = 230 }$ Step 2: Compute mean of squares of each no. ${ frac{1}{n} ({x_1}^2 + {x_2}^2 + … + {x_n}^2 ) \[7pt] = frac{1}{4} (230) \[7pt] = frac{230}{4} \[7pt] = 57.5 }$ Step 3: Compute RMS by taking sqrt of means of squares. ${ x_{rms} = sqrt{ frac{1}{n} ( {x_1}^2 + {x_2}^2 + … + {x_n}^2 } \[7pt] = sqrt {57.5} \[7pt] = frac{230}{4} \[7pt] = 7.58 }$ As a result, RMS is ${7.58}$. Print Page Previous Next Advertisements ”;
Tableau – Dashboard
Tableau – Dashboard ”; Previous Next A dashboard is a consolidated display of many worksheets and related information in a single place. It is used to compare and monitor a variety of data simultaneously. The different data views are displayed all at once. Dashboards are shown as tabs at the bottom of the workbook and they usually get updated with the most recent data from the data source. While creating a dashboard, you can add views from any worksheet in the workbook along with many supporting objects such as text areas, web pages, and images. Each view you add to the dashboard is connected to its corresponding worksheet. So when you modify the worksheet, the dashboard is updated and when you modify the view in the dashboard, the worksheet is updated. Creating a Dashboard Using the Sample-superstore, plan to create a dashboard showing the sales and profits for different segments and Sub-Category of products across all the states. To achieve this objective, following are the steps. Step 1 − Create a blank worksheet by using the add worksheet icon located at the bottom of the workbook. Drag the dimension Segment to the columns shelf and the dimension Sub-Category to the Rows Shelf. Drag and drop the measure Sales to the Color shelf and the measure Profit to the Size shelf. This worksheet is referred as the Master worksheet. Right-click and rename this worksheet as Sales_Profits. The following chart appears. Step 2 − Create another sheet to hold the details of the Sales across the States. For this, drag the dimension State to the Rows shelf and the measure Sales to the Columns shelf as shown in the following screenshot. Next, apply a filter to the State field to arrange the Sales in a descending order. Right-click and rename this worksheet as Sales_state. Step 3 − Next, create a blank dashboard by clicking the Create New Dashboard link at the bottom of the workbook. Right-click and rename the dashboard as Profit_Dashboard. Step 4 − Drag the two worksheets to the dashboard. Near the top border line of Sales Profit worksheet, you can see three small icons. Click the middle one, which shows the prompt Use as Filter on hovering the mouse over it. Step 5 − Now in the dashboard, click the box representing Sub-Category named Machines and segment named Consumer. You can notice that only the states where the sales happened for this amount of profit are filtered out in the right pane named Sales_state. This illustrates how the sheets are linked in a dashboard. Print Page Previous Next Advertisements ”;
Tableau – Question Answers
Tableau – Questions and Answers ”; Previous Next Dear readers, these Tableau Interview Questions have been designed specially to get you acquainted with the nature of questions you may encounter during your interview for the subject of SAS programming. As per my experience good interviewers hardly plan to ask any particular question during your interview, normally questions start with some basic concept of the subject and later they continue based on further discussion and what you answer − what is Tableau? Tableau is a business intelligence software that allows anyone to connect to respective data, and then visualize and create interactive, sharable dashboards. What is a data Source page? A page where you can set up your data source. The Data Source page generally consists of four main areas: left pane, join area, preview area, and metadata area. what is a extract is Tableau? A saved subset of a data source that you can use to improve performance and analyze offline. what is a format pane in Tableau? A pane that contains formatting settings that control the entire worksheet, as well as individual fields in the view. What is LOD expression in Tableau? A syntax that supports aggregation at dimensionalities other than the view level. With level of detail expressions, you can attach one or more dimensions to any aggregate expression. What is the difference between Quick Filter and Normal filter? Normal Filter is used to restrict the data from database based on selected dimension or measure. But Quick Filters are used to give a chance to user for dynamically changing data members at run time. What is Tableau Reader? Tableau Reader is a free viewing application that lets anyone read and interact with packaged workbooks created by Tableau Desktop. Can we have multiple value selection in parameter? No Which join i sused in data blending? There won”t be any joins as such but we will just give the column references like primary and foreign key relation. What are the possible reasons for slow performance in Tableau? More Extracts, filters and depends on data sources. What is the criteria to blend the data from multiple data sources.? There should be a common dimension to blend the data source into single worksheet. What is a Dimension? Tableau treats any field containing qualitative, categorical information as a dimension. This includes any field with text or dates values. What is a Measure? A measure is a field that is a dependent on value of one or more dimensions. Tableau treats any field containing numeric (quantitative) information as a measure. What does the extension .twbx represent in Tableau? It is a file which represents Tableau Packaged Workbook, in which the .twb file grouped together with the datasources. What are the types of filters in Tableau? Custom Filters ,Context Filters, Normal Filters. What is marks card in Tableau? A card to the left of the view where you can drag fields to control mark properties such as type, color, size, shape, label, tooltip, and detail. What are shelves in Tableau? They are Named areas to the left and top of the view. You build views by placing fields onto the shelves. Some shelves are available only when you select certain mark types. What is a Tableau workbook? It is a file with a .twb extension that contains one or more worksheets (and possibly also dashboards and stories). In Tableau what is a worksheet? A sheet where you build views of your data by dragging fields onto shelves. What is an alais in Tableau? An alternative name that you can assign to a field or to a dimension member. What is a context filter? In a context filter the filter condition is applied first to the data source and then some other filters are applied only to the resulting records. What is Dual Axis? You can compare multiple measures using dual axes, which are two independent axes that are layered on top of each other. What is a page shelf in Tableau? The Pages shelf is used to control the display of output by choosing the sequence of display. What are the possible reasons for slow performance in Tableau? More Extracts, filters and depends on data sources. What is table calculation in Tableau? These are inbuilt calculations in tableau which we normally use to calculate Percentange chages. What is data blending? Data blending is used to blend data from multiple data sources on a single worksheet. The data is joined on common dimensions. Can we have multiple value selection in parameter? No What is Connect live? It Creates a direct connect to the data source and speed up access. What is Import all data feature in Tableau? It Imports the entire data source into Tableaus fast data engine as an extract and saves it in the workbook. What are parameters and when do you use it? Parameters are dynamic values that can replace constant values in calculations. What is TDE file in Tableau? It refers to the file that contains data extracted from external sources like MS Excel, MS Access or CSV file. What is a story in Tableau? A story is a sheet that contains a sequence of worksheets or dashboards that work together to convey information. What is a Published data source? It contains connection information that is independent of any workbook and can be used by multiple workbooks. What is a Embedded data source? It contains connection information and is associated with a workbook. when to use Joins versus Blending in Tableau? If data resides in a single source,we use Joins but when your data is not in one place blending is used. How to automate reports using Tableau software? You need to publish report to tableau server, while publishing you will find one option to schedule reports.You just need to select the time when you want to refresh data. what is Tableau Show me? Show Me is used to apply a required view to the existing data in