Zookeeper – Home

Zookeeper Tutorial PDF Version Quick Guide Resources Job Search Discussion ZooKeeper is a distributed co-ordination service to manage large set of hosts. Co-ordinating and managing a service in a distributed environment is a complicated process. ZooKeeper solves this issue with its simple architecture and API. ZooKeeper allows developers to focus on core application logic without worrying about the distributed nature of the application. The ZooKeeper framework was originally built at “Yahoo!” for accessing their applications in an easy and robust manner. Later, Apache ZooKeeper became a standard for organized service used by Hadoop, HBase, and other distributed frameworks. For example, Apache HBase uses ZooKeeper to track the status of distributed data. This tutorial explains the basics of ZooKeeper, how to install and deploy a ZooKeeper cluster in a distributed environment, and finally concludes with a few examples using Java programming and sample applications. Audience This tutorial has been prepared for professionals aspiring to make a career in Big Data Analytics using ZooKeeper framework. It will give you enough understanding on how to use ZooKeeper to create distributed clusters. Prerequisites Before proceeding with this tutorial, you must have a good understanding of Java because the ZooKeeper server runs on JVM, distributed process, and Linux environment. Print Page Previous Next Advertisements ”;

Sqoop – Export

Sqoop – Export ”; Previous Next This chapter describes how to export data back from the HDFS to the RDBMS database. The target table must exist in the target database. The files which are given as input to the Sqoop contain records, which are called rows in table. Those are read and parsed into a set of records and delimited with user-specified delimiter. The default operation is to insert all the record from the input files to the database table using the INSERT statement. In update mode, Sqoop generates the UPDATE statement that replaces the existing record into the database. Syntax The following is the syntax for the export command. $ sqoop export (generic-args) (export-args) $ sqoop-export (generic-args) (export-args) Example Let us take an example of the employee data in file, in HDFS. The employee data is available in emp_data file in ‘emp/’ directory in HDFS. The emp_data is as follows. 1201, gopal, manager, 50000, TP 1202, manisha, preader, 50000, TP 1203, kalil, php dev, 30000, AC 1204, prasanth, php dev, 30000, AC 1205, kranthi, admin, 20000, TP 1206, satish p, grp des, 20000, GR It is mandatory that the table to be exported is created manually and is present in the database from where it has to be exported. The following query is used to create the table ‘employee’ in mysql command line. $ mysql mysql> USE db; mysql> CREATE TABLE employee ( id INT NOT NULL PRIMARY KEY, name VARCHAR(20), deg VARCHAR(20), salary INT, dept VARCHAR(10)); The following command is used to export the table data (which is in emp_data file on HDFS) to the employee table in db database of Mysql database server. $ sqoop export –connect jdbc:mysql://localhost/db –username root –table employee –export-dir /emp/emp_data The following command is used to verify the table in mysql command line. mysql>select * from employee; If the given data is stored successfully, then you can find the following table of given employee data. +——+————–+————-+——————-+——–+ | Id | Name | Designation | Salary | Dept | +——+————–+————-+——————-+——–+ | 1201 | gopal | manager | 50000 | TP | | 1202 | manisha | preader | 50000 | TP | | 1203 | kalil | php dev | 30000 | AC | | 1204 | prasanth | php dev | 30000 | AC | | 1205 | kranthi | admin | 20000 | TP | | 1206 | satish p | grp des | 20000 | GR | +——+————–+————-+——————-+——–+ Print Page Previous Next Advertisements ”;

Sqoop – Eval

Sqoop – Eval ”; Previous Next This chapter describes how to use the Sqoop ‘eval’ tool. It allows users to execute user-defined queries against respective database servers and preview the result in the console. So, the user can expect the resultant table data to import. Using eval, we can evaluate any type of SQL query that can be either DDL or DML statement. Syntax The following syntax is used for Sqoop eval command. $ sqoop eval (generic-args) (eval-args) $ sqoop-eval (generic-args) (eval-args) Select Query Evaluation Using eval tool, we can evaluate any type of SQL query. Let us take an example of selecting limited rows in the employee table of db database. The following command is used to evaluate the given example using SQL query. $ sqoop eval –connect jdbc:mysql://localhost/db –username root –query “SELECT * FROM employee LIMIT 3” If the command executes successfully, then it will produce the following output on the terminal. +——+————–+————-+——————-+——–+ | Id | Name | Designation | Salary | Dept | +——+————–+————-+——————-+——–+ | 1201 | gopal | manager | 50000 | TP | | 1202 | manisha | preader | 50000 | TP | | 1203 | khalil | php dev | 30000 | AC | +——+————–+————-+——————-+——–+ Insert Query Evaluation Sqoop eval tool can be applicable for both modeling and defining the SQL statements. That means, we can use eval for insert statements too. The following command is used to insert a new row in the employee table of db database. $ sqoop eval –connect jdbc:mysql://localhost/db –username root -e “INSERT INTO employee VALUES(1207,‘Raju’,‘UI dev’,15000,‘TP’)” If the command executes successfully, then it will display the status of the updated rows on the console. Or else, you can verify the employee table on MySQL console. The following command is used to verify the rows of employee table of db database using select’ query. mysql> mysql> use db; mysql> SELECT * FROM employee; +——+————–+————-+——————-+——–+ | Id | Name | Designation | Salary | Dept | +——+————–+————-+——————-+——–+ | 1201 | gopal | manager | 50000 | TP | | 1202 | manisha | preader | 50000 | TP | | 1203 | khalil | php dev | 30000 | AC | | 1204 | prasanth | php dev | 30000 | AC | | 1205 | kranthi | admin | 20000 | TP | | 1206 | satish p | grp des | 20000 | GR | | 1207 | Raju | UI dev | 15000 | TP | +——+————–+————-+——————-+——–+ Print Page Previous Next Advertisements ”;

Sqoop – Codegen

Sqoop – Codegen ”; Previous Next This chapter describes the importance of ‘codegen’ tool. From the viewpoint of object-oriented application, every database table has one DAO class that contains ‘getter’ and ‘setter’ methods to initialize objects. This tool (-codegen) generates the DAO class automatically. It generates DAO class in Java, based on the Table Schema structure. The Java definition is instantiated as a part of the import process. The main usage of this tool is to check if Java lost the Java code. If so, it will create a new version of Java with the default delimiter between fields. Syntax The following is the syntax for Sqoop codegen command. $ sqoop codegen (generic-args) (codegen-args) $ sqoop-codegen (generic-args) (codegen-args) Example Let us take an example that generates Java code for the emp table in the userdb database. The following command is used to execute the given example. $ sqoop codegen –connect jdbc:mysql://localhost/userdb –username root –table emp If the command executes successfully, then it will produce the following output on the terminal. 14/12/23 02:34:40 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5 14/12/23 02:34:41 INFO tool.CodeGenTool: Beginning code generation ………………. 14/12/23 02:34:42 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop Note: /tmp/sqoop-hadoop/compile/9a300a1f94899df4a9b10f9935ed9f91/emp.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 14/12/23 02:34:47 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/9a300a1f94899df4a9b10f9935ed9f91/emp.jar Verification Let us take a look at the output. The path, which is in bold, is the location that the Java code of the emp table generates and stores. Let us verify the files in that location using the following commands. $ cd /tmp/sqoop-hadoop/compile/9a300a1f94899df4a9b10f9935ed9f91/ $ ls emp.class emp.jar emp.java If you want to verify in depth, compare the emp table in the userdb database and emp.java in the following directory /tmp/sqoop-hadoop/compile/9a300a1f94899df4a9b10f9935ed9f91/. Print Page Previous Next Advertisements ”;

Tableau – Data Joining

Tableau – Data Joining ”; Previous Next Data joining is a very common requirement in any data analysis. You may need to join data from multiple sources or join data from different tables in a single source. Tableau provides the feature to join the table by using the data pane available under Edit Data Source in the Data menu. Creating a Join Consider the data source ‘Sample superstore’ to create a join between Orders and Returns table. For this, go to the Data menu and choose the option Edit Data Source. Next, drag the two tables, Orders and Returns to the data pane. Depending on the field name and datatype, Tableau will automatically create a join which can be changed later. The following screenshot shows the creation of an inner join between Orders and Returns using the Field Order ID. Editing a Join Type The type of join which the table creates automatically can be changed manually. For this, click the middle of the two circles showing the join. A popup window appears below which shows the four types of joins available. Also Tableau automatically greys out some types of joins, which it finds irrelevant on the basis of data present in the data source. In the following screenshot, you can see the inner and left outer join as the available joins. Editing Join Fields You can also change the fields forming the join condition by clicking the Data Source option available in the join popup window. While selecting the field, you can also search for the field you are looking for using a search text box. Print Page Previous Next Advertisements ”;

Tableau – Data Sources

Tableau – Data Sources ”; Previous Next Tableau can connect to all the popular data sources which are widely used. Tableau’s native connectors can connect to the following types of data sources. File Systems such as CSV, Excel, etc. Relational Systems such as Oracle, Sql Server, DB2, etc. Cloud Systems such as Windows Azure, Google BigQuery, etc. Other Sources using ODBC The following picture shows most of the data sources available through Tableau’s native data connectors. Connect Live The Connect Live feature is used for real-time data analysis. In this case, Tableau connects to real-time data source and keeps reading the data. Thus, the result of the analysis is up to the second, and the latest changes are reflected in the result. However, on the downside, it burdens the source system as it has to keep sending the data to Tableau. In-Memory Tableau can also process data in-memory by caching them in memory and not being connected to the source anymore while analyzing the data. Of course, there will be a limit to the amount of data cached depending on the availability of memory. Combine Data Sources Tableau can connect to different data sources at the same time. For example, in a single workbook you can connect to a flat file and a relational source by defining multiple connections. This is used in data blending, which is a very unique feature in Tableau. Print Page Previous Next Advertisements ”;

Variance

Statistics – Variance ”; Previous Next A variance is defined as the average of Squared differences from mean value. Combination is defined and given by the following function: Formula ${ delta = frac{ sum (M – n_i)^2 }{n}}$ Where − ${M}$ = Mean of items. ${n}$ = the number of items considered. ${n_i}$ = items. Example Problem Statement: Find the variance between following data : {600, 470, 170, 430, 300} Solution: Step 1: Determine the Mean of the given items. ${ M = frac{600 + 470 + 170 + 430 + 300}{5} \[7pt] = frac{1970}{5} \[7pt] = 394}$ Step 2: Determine Variance ${ delta = frac{ sum (M – n_i)^2 }{n} \[7pt] = frac{(600 – 394)^2 + (470 – 394)^2 + (170 – 394)^2 + (430 – 394)^2 + (300 – 394)^2}{5} \[7pt] = frac{(206)^2 + (76)^2 + (-224)^2 + (36)^2 + (-94)^2}{5} \[7pt] = frac{ 42,436 + 5,776 + 50,176 + 1,296 + 8,836}{5} \[7pt] = frac{ 108,520}{5} \[7pt] = frac{(14)(13)(3)(11)}{(2)(1)} \[7pt] = 21,704}$ As a result, Variance is ${21,704}$. Print Page Previous Next Advertisements ”;

Tableau – Save & Delete Worksheet

Tableau – Save & Delete Worksheet ”; Previous Next An existing worksheet can be both saved and deleted. This helps in organizing the contents in the Tableau desktop environment. While you can save a worksheet by clicking the save button under the main menu, you can delete a worksheet using the following steps. Deleting the Worksheet To delete a worksheet, right-click on name of the worksheet and choose the option ‘Delete Sheet’. The following screenshot shows the worksheet has been deleted. Print Page Previous Next Advertisements ”;

Transformations

Statistics – Transformations ”; Previous Next Data transformation refers to application of a function to each item in a data set. Here $ x_i $ is replaced by its transformed value $ y_i $ where $ y_i = f(x_i) $. Data transformations are carried out generally to make appearance of graphs more interpretable. There are four major functions used for transformations. $ log x $ – logarithm transformations. For example sound units are in decibels and is generally represented using log transformations. $ frac{1}{x} $ – Reciprocal Transformations. For example Time to complete race/ task is represents using speed. More the speed lesser the time taken. $ sqrt{x} $ – Square root Transformations. For example areas of circular ground are compared using their radius. $ {x^2} $ – Power Transformations. For example to compare negative numbers. logarithm and Square root Transformations are used in case of positive numbers where as Reciprocal and Power Transformations can be used in case of both negative as well as positive numbers. Following diagrams illustrates the use of logarithm transformation to compare population graphically. Before transformation After transformation Print Page Previous Next Advertisements ”;

Tableau – Show Me

Tableau – Show Me ”; Previous Next As an advanced data visualization tool, Tableau makes the data analysis very easy by providing many analysis techniques without writing any custom code. One such feature is Show Me. It can be used to apply a required view to the existing data in the worksheet. Those views can be a pie chart, scatter plot, or a line chart. Whenever a worksheet with data is created, it is available in the top right corner as shown in the following figure. Some of the view options will be greyed out depending on the nature of selection in the data pane. Show Me with Two Fields The relation between two fields can be visually analyzed easily by using various graphs and charts available in Show Me. In this case, we choose two fields and apply a line chart. Following are the steps − Step 1 − Select the two fields (order date and profit) to be analyzed by holding the control key. Step 2 − Click the Show Me bar and choose line chart. Step 3 − Click the Mark Label button on the scrollbar. The following diagram shows the line chart created using the above steps. Show Me with Multiple Fields We can apply a similar technique as above to analyze more than 2 fields. The only difference in this case will be the availability of fewer views in active form. Tableau automatically greys out the views that are not appropriate for the analysis of the fields chosen. In this case, choose the field’s product name, customer name, sales and profit by holding down the control key. As you can observe, most of the views in Show Me are greyed out. From the active views, choose Scatter View. The following diagram shows the Scatter View chart created. Print Page Previous Next Advertisements ”;