Hive – Introduction

Hive – Introduction ”; Previous Next The term ‘Big Data’ is used for collections of large datasets that include huge volume, high velocity, and a variety of data that is increasing day by day. Using traditional data management systems, it is difficult to process Big Data. Therefore, the Apache Software Foundation introduced a framework called Hadoop to solve Big Data management and processing challenges. Hadoop Hadoop is an open-source framework to store and process Big Data in a distributed environment. It contains two modules, one is MapReduce and another is Hadoop Distributed File System (HDFS). MapReduce: It is a parallel programming model for processing large amounts of structured, semi-structured, and unstructured data on large clusters of commodity hardware. HDFS:Hadoop Distributed File System is a part of Hadoop framework, used to store and process the datasets. It provides a fault-tolerant file system to run on commodity hardware. The Hadoop ecosystem contains different sub-projects (tools) such as Sqoop, Pig, and Hive that are used to help Hadoop modules. Sqoop: It is used to import and export data to and from between HDFS and RDBMS. Pig: It is a procedural language platform used to develop a script for MapReduce operations. Hive: It is a platform used to develop SQL type scripts to do MapReduce operations. Note: There are various ways to execute MapReduce operations: The traditional approach using Java MapReduce program for structured, semi-structured, and unstructured data. The scripting approach for MapReduce to process structured and semi structured data using Pig. The Hive Query Language (HiveQL or HQL) for MapReduce to process structured data using Hive. What is Hive Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy. Initially Hive was developed by Facebook, later the Apache Software Foundation took it up and developed it further as an open source under the name Apache Hive. It is used by different companies. For example, Amazon uses it in Amazon Elastic MapReduce. Hive is not A relational database A design for OnLine Transaction Processing (OLTP) A language for real-time queries and row-level updates Features of Hive It stores schema in a database and processed data into HDFS. It is designed for OLAP. It provides SQL type language for querying called HiveQL or HQL. It is familiar, fast, scalable, and extensible. Architecture of Hive The following component diagram depicts the architecture of Hive: This component diagram contains different units. The following table describes each unit: Unit Name Operation User Interface Hive is a data warehouse infrastructure software that can create interaction between user and HDFS. The user interfaces that Hive supports are Hive Web UI, Hive command line, and Hive HD Insight (In Windows server). Meta Store Hive chooses respective database servers to store the schema or Metadata of tables, databases, columns in a table, their data types, and HDFS mapping. HiveQL Process Engine HiveQL is similar to SQL for querying on schema info on the Metastore. It is one of the replacements of traditional approach for MapReduce program. Instead of writing MapReduce program in Java, we can write a query for MapReduce job and process it. Execution Engine The conjunction part of HiveQL process Engine and MapReduce is Hive Execution Engine. Execution engine processes the query and generates results as same as MapReduce results. It uses the flavor of MapReduce. HDFS or HBASE Hadoop distributed file system or HBASE are the data storage techniques to store data into file system. Working of Hive The following diagram depicts the workflow between Hive and Hadoop. The following table defines how Hive interacts with Hadoop framework: Step No. Operation 1 Execute Query The Hive interface such as Command Line or Web UI sends query to Driver (any database driver such as JDBC, ODBC, etc.) to execute. 2 Get Plan The driver takes the help of query compiler that parses the query to check the syntax and query plan or the requirement of query. 3 Get Metadata The compiler sends metadata request to Metastore (any database). 4 Send Metadata Metastore sends metadata as a response to the compiler. 5 Send Plan The compiler checks the requirement and resends the plan to the driver. Up to here, the parsing and compiling of a query is complete. 6 Execute Plan The driver sends the execute plan to the execution engine. 7 Execute Job Internally, the process of execution job is a MapReduce job. The execution engine sends the job to JobTracker, which is in Name node and it assigns this job to TaskTracker, which is in Data node. Here, the query executes MapReduce job. 7.1 Metadata Ops Meanwhile in execution, the execution engine can execute metadata operations with Metastore. 8 Fetch Result The execution engine receives the results from Data nodes. 9 Send Results The execution engine sends those resultant values to the driver. 10 Send Results The driver sends the results to Hive Interfaces. Print Page Previous Next Advertisements ”;

HiveQL – Select Where

HiveQL – Select-Where ”; Previous Next The Hive Query Language (HiveQL) is a query language for Hive to process and analyze structured data in a Metastore. This chapter explains how to use the SELECT statement with WHERE clause. SELECT statement is used to retrieve the data from a table. WHERE clause works similar to a condition. It filters the data using the condition and gives you a finite result. The built-in operators and functions generate an expression, which fulfils the condition. Syntax Given below is the syntax of the SELECT query: SELECT [ALL | DISTINCT] select_expr, select_expr, … FROM table_reference [WHERE where_condition] [GROUP BY col_list] [HAVING having_condition] [CLUSTER BY col_list | [DISTRIBUTE BY col_list] [SORT BY col_list]] [LIMIT number]; Example Let us take an example for SELECT…WHERE clause. Assume we have the employee table as given below, with fields named Id, Name, Salary, Designation, and Dept. Generate a query to retrieve the employee details who earn a salary of more than Rs 30000. +——+————–+————-+——————-+——–+ | ID | Name | Salary | Designation | Dept | +——+————–+————-+——————-+——–+ |1201 | Gopal | 45000 | Technical manager | TP | |1202 | Manisha | 45000 | Proofreader | PR | |1203 | Masthanvali | 40000 | Technical writer | TP | |1204 | Krian | 40000 | Hr Admin | HR | |1205 | Kranthi | 30000 | Op Admin | Admin | +——+————–+————-+——————-+——–+ The following query retrieves the employee details using the above scenario: hive> SELECT * FROM employee WHERE salary>30000; On successful execution of the query, you get to see the following response: +——+————–+————-+——————-+——–+ | ID | Name | Salary | Designation | Dept | +——+————–+————-+——————-+——–+ |1201 | Gopal | 45000 | Technical manager | TP | |1202 | Manisha | 45000 | Proofreader | PR | |1203 | Masthanvali | 40000 | Technical writer | TP | |1204 | Krian | 40000 | Hr Admin | HR | +——+————–+————-+——————-+——–+ JDBC Program The JDBC program to apply where clause for the given example is as follows. import java.sql.SQLException; import java.sql.Connection; import java.sql.ResultSet; import java.sql.Statement; import java.sql.DriverManager; public class HiveQLWhere { private static String driverName = “org.apache.hadoop.hive.jdbc.HiveDriver”; public static void main(String[] args) throws SQLException { // Register driver and create driver instance Class.forName(driverName); // get connection Connection con = DriverManager.getConnection(“jdbc:hive://localhost:10000/userdb”, “”, “”); // create statement Statement stmt = con.createStatement(); // execute statement Resultset res = stmt.executeQuery(“SELECT * FROM employee WHERE salary>30000;”); System.out.println(“Result:”); System.out.println(” ID t Name t Salary t Designation t Dept “); while (res.next()) { System.out.println(res.getInt(1) + ” ” + res.getString(2) + ” ” + res.getDouble(3) + ” ” + res.getString(4) + ” ” + res.getString(5)); } con.close(); } } Save the program in a file named HiveQLWhere.java. Use the following commands to compile and execute this program. $ javac HiveQLWhere.java $ java HiveQLWhere Output: ID Name Salary Designation Dept 1201 Gopal 45000 Technical manager TP 1202 Manisha 45000 Proofreader PR 1203 Masthanvali 40000 Technical writer TP 1204 Krian 40000 Hr Admin HR Print Page Previous Next Advertisements ”;

HiveQL – Select Joins

HiveQL – Select-Joins ”; Previous Next JOIN is a clause that is used for combining specific fields from two tables by using values common to each one. It is used to combine records from two or more tables in the database. Syntax join_table: table_reference JOIN table_factor [join_condition] | table_reference {LEFT|RIGHT|FULL} [OUTER] JOIN table_reference join_condition | table_reference LEFT SEMI JOIN table_reference join_condition | table_reference CROSS JOIN table_reference [join_condition] Example We will use the following two tables in this chapter. Consider the following table named CUSTOMERS.. +—-+———-+—–+———–+———-+ | ID | NAME | AGE | ADDRESS | SALARY | +—-+———-+—–+———–+———-+ | 1 | Ramesh | 32 | Ahmedabad | 2000.00 | | 2 | Khilan | 25 | Delhi | 1500.00 | | 3 | kaushik | 23 | Kota | 2000.00 | | 4 | Chaitali | 25 | Mumbai | 6500.00 | | 5 | Hardik | 27 | Bhopal | 8500.00 | | 6 | Komal | 22 | MP | 4500.00 | | 7 | Muffy | 24 | Indore | 10000.00 | +—-+———-+—–+———–+———-+ Consider another table ORDERS as follows: +—–+———————+————-+——–+ |OID | DATE | CUSTOMER_ID | AMOUNT | +—–+———————+————-+——–+ | 102 | 2009-10-08 00:00:00 | 3 | 3000 | | 100 | 2009-10-08 00:00:00 | 3 | 1500 | | 101 | 2009-11-20 00:00:00 | 2 | 1560 | | 103 | 2008-05-20 00:00:00 | 4 | 2060 | +—–+———————+————-+——–+ There are different types of joins given as follows: JOIN LEFT OUTER JOIN RIGHT OUTER JOIN FULL OUTER JOIN JOIN JOIN clause is used to combine and retrieve the records from multiple tables. JOIN is same as OUTER JOIN in SQL. A JOIN condition is to be raised using the primary keys and foreign keys of the tables. The following query executes JOIN on the CUSTOMER and ORDER tables, and retrieves the records: hive> SELECT c.ID, c.NAME, c.AGE, o.AMOUNT FROM CUSTOMERS c JOIN ORDERS o ON (c.ID = o.CUSTOMER_ID); On successful execution of the query, you get to see the following response: +—-+———-+—–+——–+ | ID | NAME | AGE | AMOUNT | +—-+———-+—–+——–+ | 3 | kaushik | 23 | 3000 | | 3 | kaushik | 23 | 1500 | | 2 | Khilan | 25 | 1560 | | 4 | Chaitali | 25 | 2060 | +—-+———-+—–+——–+ LEFT OUTER JOIN The HiveQL LEFT OUTER JOIN returns all the rows from the left table, even if there are no matches in the right table. This means, if the ON clause matches 0 (zero) records in the right table, the JOIN still returns a row in the result, but with NULL in each column from the right table. A LEFT JOIN returns all the values from the left table, plus the matched values from the right table, or NULL in case of no matching JOIN predicate. The following query demonstrates LEFT OUTER JOIN between CUSTOMER and ORDER tables: hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE FROM CUSTOMERS c LEFT OUTER JOIN ORDERS o ON (c.ID = o.CUSTOMER_ID); On successful execution of the query, you get to see the following response: +—-+———-+——–+———————+ | ID | NAME | AMOUNT | DATE | +—-+———-+——–+———————+ | 1 | Ramesh | NULL | NULL | | 2 | Khilan | 1560 | 2009-11-20 00:00:00 | | 3 | kaushik | 3000 | 2009-10-08 00:00:00 | | 3 | kaushik | 1500 | 2009-10-08 00:00:00 | | 4 | Chaitali | 2060 | 2008-05-20 00:00:00 | | 5 | Hardik | NULL | NULL | | 6 | Komal | NULL | NULL | | 7 | Muffy | NULL | NULL | +—-+———-+——–+———————+ RIGHT OUTER JOIN The HiveQL RIGHT OUTER JOIN returns all the rows from the right table, even if there are no matches in the left table. If the ON clause matches 0 (zero) records in the left table, the JOIN still returns a row in the result, but with NULL in each column from the left table. A RIGHT JOIN returns all the values from the right table, plus the matched values from the left table, or NULL in case of no matching join predicate. The following query demonstrates RIGHT OUTER JOIN between the CUSTOMER and ORDER tables. notranslate”> hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE FROM CUSTOMERS c RIGHT OUTER JOIN ORDERS o ON (c.ID = o.CUSTOMER_ID); On successful execution of the query, you get to see the following response: +——+———-+——–+———————+ | ID | NAME | AMOUNT | DATE | +——+———-+——–+———————+ | 3 | kaushik | 3000 | 2009-10-08 00:00:00 | | 3 | kaushik | 1500 | 2009-10-08 00:00:00 | | 2 | Khilan | 1560 | 2009-11-20 00:00:00 | | 4 | Chaitali | 2060 | 2008-05-20 00:00:00 | +——+———-+——–+———————+ FULL OUTER JOIN The HiveQL FULL OUTER JOIN combines the records of both the left and the right outer tables that fulfil the JOIN condition. The joined table contains either all the records from both the tables, or fills in NULL values for missing matches on either side. The following query demonstrates FULL OUTER JOIN between CUSTOMER and ORDER tables: hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE FROM CUSTOMERS c FULL OUTER JOIN ORDERS o ON (c.ID = o.CUSTOMER_ID); On successful execution of the query, you get to see the following response: +——+———-+——–+———————+ | ID | NAME | AMOUNT | DATE | +——+———-+——–+———————+ | 1 | Ramesh | NULL | NULL | | 2 | Khilan | 1560 | 2009-11-20 00:00:00 | | 3 | kaushik | 3000 | 2009-10-08 00:00:00 | | 3 | kaushik | 1500 | 2009-10-08 00:00:00 | | 4 | Chaitali | 2060 | 2008-05-20 00:00:00 | | 5 | Hardik | NULL | NULL | | 6 | Komal | NULL | NULL | | 7 | Muffy | NULL | NULL | | 3 | kaushik | 3000 | 2009-10-08 00:00:00 | | 3 | kaushik | 1500 | 2009-10-08 00:00:00 | | 2 | Khilan | 1560 | 2009-11-20 00:00:00 | | 4 | Chaitali | 2060

Hive – Create Table

Hive – Create Table ”; Previous Next This chapter explains how to create a table and how to insert data into it. The conventions of creating a table in HIVE is quite similar to creating a table using SQL. Create Table Statement Create Table is a statement used to create a table in Hive. The syntax and example are as follows: Syntax CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.] table_name [(col_name data_type [COMMENT col_comment], …)] [COMMENT table_comment] [ROW FORMAT row_format] [STORED AS file_format] Example Let us assume you need to create a table named employee using CREATE TABLE statement. The following table lists the fields and their data types in employee table: Sr.No Field Name Data Type 1 Eid int 2 Name String 3 Salary Float 4 Designation string The following data is a Comment, Row formatted fields such as Field terminator, Lines terminator, and Stored File type. COMMENT ‘Employee details’ FIELDS TERMINATED BY ‘t’ LINES TERMINATED BY ‘n’ STORED IN TEXT FILE The following query creates a table named employee using the above data. hive> CREATE TABLE IF NOT EXISTS employee ( eid int, name String, salary String, destination String) COMMENT ‘Employee details’ ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘t’ LINES TERMINATED BY ‘n’ STORED AS TEXTFILE; If you add the option IF NOT EXISTS, Hive ignores the statement in case the table already exists. On successful creation of table, you get to see the following response: OK Time taken: 5.905 seconds hive> JDBC Program The JDBC program to create a table is given example. import java.sql.SQLException; import java.sql.Connection; import java.sql.ResultSet; import java.sql.Statement; import java.sql.DriverManager; public class HiveCreateTable { private static String driverName = “org.apache.hadoop.hive.jdbc.HiveDriver”; public static void main(String[] args) throws SQLException { // Register driver and create driver instance Class.forName(driverName); // get connection Connection con = DriverManager.getConnection(“jdbc:hive://localhost:10000/userdb”, “”, “”); // create statement Statement stmt = con.createStatement(); // execute statement stmt.executeQuery(“CREATE TABLE IF NOT EXISTS ” +” employee ( eid int, name String, ” +” salary String, destignation String)” +” COMMENT ‘Employee details’” +” ROW FORMAT DELIMITED” +” FIELDS TERMINATED BY ‘t’” +” LINES TERMINATED BY ‘n’” +” STORED AS TEXTFILE;”); System.out.println(“ Table employee created.”); con.close(); } } Save the program in a file named HiveCreateDb.java. The following commands are used to compile and execute this program. $ javac HiveCreateDb.java $ java HiveCreateDb Output Table employee created. Load Data Statement Generally, after creating a table in SQL, we can insert data using the Insert statement. But in Hive, we can insert data using the LOAD DATA statement. While inserting data into Hive, it is better to use LOAD DATA to store bulk records. There are two ways to load data: one is from local file system and second is from Hadoop file system. Syntax The syntax for load data is as follows: LOAD DATA [LOCAL] INPATH ”filepath” [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 …)] LOCAL is identifier to specify the local path. It is optional. OVERWRITE is optional to overwrite the data in the table. PARTITION is optional. Example We will insert the following data into the table. It is a text file named sample.txt in /home/user directory. 1201 Gopal 45000 Technical manager 1202 Manisha 45000 Proof reader 1203 Masthanvali 40000 Technical writer 1204 Kiran 40000 Hr Admin 1205 Kranthi 30000 Op Admin The following query loads the given text into the table. hive> LOAD DATA LOCAL INPATH ”/home/user/sample.txt” OVERWRITE INTO TABLE employee; On successful download, you get to see the following response: OK Time taken: 15.905 seconds hive> JDBC Program Given below is the JDBC program to load given data into the table. import java.sql.SQLException; import java.sql.Connection; import java.sql.ResultSet; import java.sql.Statement; import java.sql.DriverManager; public class HiveLoadData { private static String driverName = “org.apache.hadoop.hive.jdbc.HiveDriver”; public static void main(String[] args) throws SQLException { // Register driver and create driver instance Class.forName(driverName); // get connection Connection con = DriverManager.getConnection(“jdbc:hive://localhost:10000/userdb”, “”, “”); // create statement Statement stmt = con.createStatement(); // execute statement stmt.executeQuery(“LOAD DATA LOCAL INPATH ”/home/user/sample.txt”” + “OVERWRITE INTO TABLE employee;”); System.out.println(“Load Data into employee successful”); con.close(); } } Save the program in a file named HiveLoadData.java. Use the following commands to compile and execute this program. $ javac HiveLoadData.java $ java HiveLoadData Output: Load Data into employee successful Print Page Previous Next Advertisements ”;

HiveQL – Select Group By

HiveQL – Select-Group By ”; Previous Next This chapter explains the details of GROUP BY clause in a SELECT statement. The GROUP BY clause is used to group all the records in a result set using a particular collection column. It is used to query a group of records. Syntax The syntax of GROUP BY clause is as follows: SELECT [ALL | DISTINCT] select_expr, select_expr, … FROM table_reference [WHERE where_condition] [GROUP BY col_list] [HAVING having_condition] [ORDER BY col_list]] [LIMIT number]; Example Let us take an example of SELECT…GROUP BY clause. Assume employee table as given below, with Id, Name, Salary, Designation, and Dept fields. Generate a query to retrieve the number of employees in each department. +——+————–+————-+——————-+——–+ | ID | Name | Salary | Designation | Dept | +——+————–+————-+——————-+——–+ |1201 | Gopal | 45000 | Technical manager | TP | |1202 | Manisha | 45000 | Proofreader | PR | |1203 | Masthanvali | 40000 | Technical writer | TP | |1204 | Krian | 45000 | Proofreader | PR | |1205 | Kranthi | 30000 | Op Admin | Admin | +——+————–+————-+——————-+——–+ The following query retrieves the employee details using the above scenario. hive> SELECT Dept,count(*) FROM employee GROUP BY DEPT; On successful execution of the query, you get to see the following response: +——+————–+ | Dept | Count(*) | +——+————–+ |Admin | 1 | |PR | 2 | |TP | 3 | +——+————–+ JDBC Program Given below is the JDBC program to apply the Group By clause for the given example. import java.sql.SQLException; import java.sql.Connection; import java.sql.ResultSet; import java.sql.Statement; import java.sql.DriverManager; public class HiveQLGroupBy { private static String driverName = “org.apache.hadoop.hive.jdbc.HiveDriver”; public static void main(String[] args) throws SQLException { // Register driver and create driver instance Class.forName(driverName); // get connection Connection con = DriverManager. getConnection(“jdbc:hive://localhost:10000/userdb”, “”, “”); // create statement Statement stmt = con.createStatement(); // execute statement Resultset res = stmt.executeQuery(“SELECT Dept,count(*) ” + “FROM employee GROUP BY DEPT; ”); System.out.println(” Dept t count(*)”); while (res.next()) { System.out.println(res.getString(1) + ” ” + res.getInt(2)); } con.close(); } } Save the program in a file named HiveQLGroupBy.java. Use the following commands to compile and execute this program. $ javac HiveQLGroupBy.java $ java HiveQLGroupBy Output: Dept Count(*) Admin 1 PR 2 TP 3 Print Page Previous Next Advertisements ”;

Hive – Quick Guide

Hive – Quick Guide ”; Previous Next Hive – Introduction The term ‘Big Data’ is used for collections of large datasets that include huge volume, high velocity, and a variety of data that is increasing day by day. Using traditional data management systems, it is difficult to process Big Data. Therefore, the Apache Software Foundation introduced a framework called Hadoop to solve Big Data management and processing challenges. Hadoop Hadoop is an open-source framework to store and process Big Data in a distributed environment. It contains two modules, one is MapReduce and another is Hadoop Distributed File System (HDFS). MapReduce: It is a parallel programming model for processing large amounts of structured, semi-structured, and unstructured data on large clusters of commodity hardware. HDFS:Hadoop Distributed File System is a part of Hadoop framework, used to store and process the datasets. It provides a fault-tolerant file system to run on commodity hardware. The Hadoop ecosystem contains different sub-projects (tools) such as Sqoop, Pig, and Hive that are used to help Hadoop modules. Sqoop: It is used to import and export data to and fro between HDFS and RDBMS. Pig: It is a procedural language platform used to develop a script for MapReduce operations. Hive: It is a platform used to develop SQL type scripts to do MapReduce operations. Note: There are various ways to execute MapReduce operations: The traditional approach using Java MapReduce program for structured, semi-structured, and unstructured data. The scripting approach for MapReduce to process structured and semi structured data using Pig. The Hive Query Language (HiveQL or HQL) for MapReduce to process structured data using Hive. What is Hive Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy. Initially Hive was developed by Facebook, later the Apache Software Foundation took it up and developed it further as an open source under the name Apache Hive. It is used by different companies. For example, Amazon uses it in Amazon Elastic MapReduce. Hive is not A relational database A design for OnLine Transaction Processing (OLTP) A language for real-time queries and row-level updates Features of Hive It stores schema in a database and processed data into HDFS. It is designed for OLAP. It provides SQL type language for querying called HiveQL or HQL. It is familiar, fast, scalable, and extensible. Architecture of Hive The following component diagram depicts the architecture of Hive: This component diagram contains different units. The following table describes each unit: Unit Name Operation User Interface Hive is a data warehouse infrastructure software that can create interaction between user and HDFS. The user interfaces that Hive supports are Hive Web UI, Hive command line, and Hive HD Insight (In Windows server). Meta Store Hive chooses respective database servers to store the schema or Metadata of tables, databases, columns in a table, their data types, and HDFS mapping. HiveQL Process Engine HiveQL is similar to SQL for querying on schema info on the Metastore. It is one of the replacements of traditional approach for MapReduce program. Instead of writing MapReduce program in Java, we can write a query for MapReduce job and process it. Execution Engine The conjunction part of HiveQL process Engine and MapReduce is Hive Execution Engine. Execution engine processes the query and generates results as same as MapReduce results. It uses the flavor of MapReduce. HDFS or HBASE Hadoop distributed file system or HBASE are the data storage techniques to store data into file system. Working of Hive The following diagram depicts the workflow between Hive and Hadoop. The following table defines how Hive interacts with Hadoop framework: Step No. Operation 1 Execute Query The Hive interface such as Command Line or Web UI sends query to Driver (any database driver such as JDBC, ODBC, etc.) to execute. 2 Get Plan The driver takes the help of query compiler that parses the query to check the syntax and query plan or the requirement of query. 3 Get Metadata The compiler sends metadata request to Metastore (any database). 4 Send Metadata Metastore sends metadata as a response to the compiler. 5 Send Plan The compiler checks the requirement and resends the plan to the driver. Up to here, the parsing and compiling of a query is complete. 6 Execute Plan The driver sends the execute plan to the execution engine. 7 Execute Job Internally, the process of execution job is a MapReduce job. The execution engine sends the job to JobTracker, which is in Name node and it assigns this job to TaskTracker, which is in Data node. Here, the query executes MapReduce job. 7.1 Metadata Ops Meanwhile in execution, the execution engine can execute metadata operations with Metastore. 8 Fetch Result The execution engine receives the results from Data nodes. 9 Send Results The execution engine sends those resultant values to the driver. 10 Send Results The driver sends the results to Hive Interfaces. Hive – Installation All Hadoop sub-projects such as Hive, Pig, and HBase support Linux operating system. Therefore, you need to install any Linux flavored OS. The following simple steps are executed for Hive installation: Step 1: Verifying JAVA Installation Java must be installed on your system before installing Hive. Let us verify java installation using the following command: $ java –version If Java is already installed on your system, you get to see the following response: java version “1.7.0_71” Java(TM) SE Runtime Environment (build 1.7.0_71-b13) Java HotSpot(TM) Client VM (build 25.0-b02, mixed mode) If java is not installed in your system, then follow the steps given below for installing java. Installing Java Step I: Download java (JDK <latest version> – X64.tar.gz) by visiting the following link http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html. Then jdk-7u71-linux-x64.tar.gz will be downloaded onto your system. Step II: Generally you will find the downloaded java file in the Downloads folder. Verify it and extract the jdk-7u71-linux-x64.gz file using the following commands. $ cd Downloads/ $ ls jdk-7u71-linux-x64.gz $ tar

Hive – Questions and Answers

Hive Questions and Answers ”; Previous Next Hive Questions and Answers has been designed with a special intention of helping students and professionals preparing for various Certification Exams and Job Interviews. This section provides a useful collection of sample Interview Questions and Multiple Choice Questions (MCQs) and their answers with appropriate explanations. SN Question/Answers Type 1 Hive Interview Questions This section provides a huge collection of Hive Interview Questions with their answers hidden in a box to challenge you to have a go at them before discovering the correct answer. 2 Hive Online Quiz This section provides a great collection of Hive Multiple Choice Questions (MCQs) on a single page along with their correct answers and explanation. If you select the right option, it turns green; else red. 3 Hive Online Test If you are preparing to appear for a Java and Hive Framework related certification exam, then this section is a must for you. This section simulates a real online test along with a given timer which challenges you to complete the test within a given time-frame. Finally you can check your overall test score and how you fared among millions of other candidates who attended this online test. 4 Hive Mock Test This section provides various mock tests that you can download at your local machine and solve offline. Every mock test is supplied with a mock test key to let you verify the final score and grade yourself. Print Page Previous Next Advertisements ”;

Hive – Useful Resources

Hive – Useful Resources ”; Previous Next The following resources contain additional information on Hive. Please use them to get more in-depth knowledge on this topic. Useful Video Courses Big Data Analytics Using Hive In Hadoop 21 Lectures 2 hours Mukund Kumar Mishra More Detail Advance Big Data Analytics using Hive & Sqoop Best Seller 51 Lectures 4 hours Navdeep Kaur More Detail Apache Hive for Data Engineers (Hands On) Most Popular 92 Lectures 6 hours Bigdata Engineer More Detail Apache Hive Interview Question and Answer (100+ FAQ) 109 Lectures 2 hours Bigdata Engineer More Detail Flutter Course – Master Flutter From Scratch and Create Platform Independent Apps 55 Lectures 9 hours Code Studio More Detail Learn Hive – Course for Beginners 22 Lectures 2.5 hours Corporate Bridge Consultancy Private Limited More Detail Print Page Previous Next Advertisements ”;

Hive – Views And Indexes

Hive – View and Indexes ”; Previous Next This chapter describes how to create and manage views. Views are generated based on user requirements. You can save any result set data as a view. The usage of view in Hive is same as that of the view in SQL. It is a standard RDBMS concept. We can execute all DML operations on a view. Creating a View You can create a view at the time of executing a SELECT statement. The syntax is as follows: CREATE VIEW [IF NOT EXISTS] view_name [(column_name [COMMENT column_comment], …) ] [COMMENT table_comment] AS SELECT … Example Let us take an example for view. Assume employee table as given below, with the fields Id, Name, Salary, Designation, and Dept. Generate a query to retrieve the employee details who earn a salary of more than Rs 30000. We store the result in a view named emp_30000. +——+————–+————-+——————-+——–+ | ID | Name | Salary | Designation | Dept | +——+————–+————-+——————-+——–+ |1201 | Gopal | 45000 | Technical manager | TP | |1202 | Manisha | 45000 | Proofreader | PR | |1203 | Masthanvali | 40000 | Technical writer | TP | |1204 | Krian | 40000 | Hr Admin | HR | |1205 | Kranthi | 30000 | Op Admin | Admin | +——+————–+————-+——————-+——–+ The following query retrieves the employee details using the above scenario: hive> CREATE VIEW emp_30000 AS SELECT * FROM employee WHERE salary>30000; Dropping a View Use the following syntax to drop a view: DROP VIEW view_name The following query drops a view named as emp_30000: hive> DROP VIEW emp_30000; Creating an Index An Index is nothing but a pointer on a particular column of a table. Creating an index means creating a pointer on a particular column of a table. Its syntax is as follows: CREATE INDEX index_name ON TABLE base_table_name (col_name, …) AS ”index.handler.class.name” [WITH DEFERRED REBUILD] [IDXPROPERTIES (property_name=property_value, …)] [IN TABLE index_table_name] [PARTITIONED BY (col_name, …)] [ [ ROW FORMAT …] STORED AS … | STORED BY … ] [LOCATION hdfs_path] [TBLPROPERTIES (…)] Example Let us take an example for index. Use the same employee table that we have used earlier with the fields Id, Name, Salary, Designation, and Dept. Create an index named index_salary on the salary column of the employee table. The following query creates an index: hive> CREATE INDEX inedx_salary ON TABLE employee(salary) AS ”org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler”; It is a pointer to the salary column. If the column is modified, the changes are stored using an index value. Dropping an Index The following syntax is used to drop an index: DROP INDEX <index_name> ON <table_name> The following query drops an index named index_salary: hive> DROP INDEX index_salary ON employee; Print Page Previous Next Advertisements ”;

Hive – Built-In Operators

Hive – Built-in Operators ”; Previous Next This chapter explains the built-in operators of Hive. There are four types of operators in Hive: Relational Operators Arithmetic Operators Logical Operators Complex Operators Relational Operators These operators are used to compare two operands. The following table describes the relational operators available in Hive: Operator Operand Description A = B all primitive types TRUE if expression A is equivalent to expression B otherwise FALSE. A != B all primitive types TRUE if expression A is not equivalent to expression B otherwise FALSE. A < B all primitive types TRUE if expression A is less than expression B otherwise FALSE. A <= B all primitive types TRUE if expression A is less than or equal to expression B otherwise FALSE. A > B all primitive types TRUE if expression A is greater than expression B otherwise FALSE. A >= B all primitive types TRUE if expression A is greater than or equal to expression B otherwise FALSE. A IS NULL all types TRUE if expression A evaluates to NULL otherwise FALSE. A IS NOT NULL all types FALSE if expression A evaluates to NULL otherwise TRUE. A LIKE B Strings TRUE if string pattern A matches to B otherwise FALSE. A RLIKE B Strings NULL if A or B is NULL, TRUE if any substring of A matches the Java regular expression B , otherwise FALSE. A REGEXP B Strings Same as RLIKE. Example Let us assume the employee table is composed of fields named Id, Name, Salary, Designation, and Dept as shown below. Generate a query to retrieve the employee details whose Id is 1205. +—–+————–+——–+—————————+——+ | Id | Name | Salary | Designation | Dept | +—–+————–+————————————+——+ |1201 | Gopal | 45000 | Technical manager | TP | |1202 | Manisha | 45000 | Proofreader | PR | |1203 | Masthanvali | 40000 | Technical writer | TP | |1204 | Krian | 40000 | Hr Admin | HR | |1205 | Kranthi | 30000 | Op Admin | Admin| +—–+————–+——–+—————————+——+ The following query is executed to retrieve the employee details using the above table: hive> SELECT * FROM employee WHERE Id=1205; On successful execution of query, you get to see the following response: +—–+———–+———–+———————————-+ | ID | Name | Salary | Designation | Dept | +—–+—————+——-+———————————-+ |1205 | Kranthi | 30000 | Op Admin | Admin | +—–+———–+———–+———————————-+ The following query is executed to retrieve the employee details whose salary is more than or equal to Rs 40000. hive> SELECT * FROM employee WHERE Salary>=40000; On successful execution of query, you get to see the following response: +—–+————+——–+—————————-+——+ | ID | Name | Salary | Designation | Dept | +—–+————+——–+—————————-+——+ |1201 | Gopal | 45000 | Technical manager | TP | |1202 | Manisha | 45000 | Proofreader | PR | |1203 | Masthanvali| 40000 | Technical writer | TP | |1204 | Krian | 40000 | Hr Admin | HR | +—–+————+——–+—————————-+——+ Arithmetic Operators These operators support various common arithmetic operations on the operands. All of them return number types. The following table describes the arithmetic operators available in Hive: Operators Operand Description A + B all number types Gives the result of adding A and B. A – B all number types Gives the result of subtracting B from A. A * B all number types Gives the result of multiplying A and B. A / B all number types Gives the result of dividing B from A. A % B all number types Gives the reminder resulting from dividing A by B. A & B all number types Gives the result of bitwise AND of A and B. A | B all number types Gives the result of bitwise OR of A and B. A ^ B all number types Gives the result of bitwise XOR of A and B. ~A all number types Gives the result of bitwise NOT of A. Example The following query adds two numbers, 20 and 30. hive> SELECT 20+30 ADD FROM temp; On successful execution of the query, you get to see the following response: +——–+ | ADD | +——–+ | 50 | +——–+ Logical Operators The operators are logical expressions. All of them return either TRUE or FALSE. Operators Operands Description A AND B boolean TRUE if both A and B are TRUE, otherwise FALSE. A && B boolean Same as A AND B. A OR B boolean TRUE if either A or B or both are TRUE, otherwise FALSE. A || B boolean Same as A OR B. NOT A boolean TRUE if A is FALSE, otherwise FALSE. !A boolean Same as NOT A. Example The following query is used to retrieve employee details whose Department is TP and Salary is more than Rs 40000. hive> SELECT * FROM employee WHERE Salary>40000 && Dept=TP; On successful execution of the query, you get to see the following response: +——+————–+————-+——————-+——–+ | ID | Name | Salary | Designation | Dept | +——+————–+————-+——————-+——–+ |1201 | Gopal | 45000 | Technical manager | TP | +——+————–+————-+——————-+——–+ Complex Operators These operators provide an expression to access the elements of Complex Types. Operator Operand Description A[n] A is an Array and n is an int It returns the nth element in the array A. The first element has index 0. M[key] M is a Map<K, V> and key has type K It returns the value corresponding to the key in the map. S.x S is a struct It returns the x field of S. Print Page Previous Next Advertisements ”;