Hive – Drop Table

Hive – Drop Table ”; Previous Next This chapter describes how to drop a table in Hive. When you drop a table from Hive Metastore, it removes the table/column data and their metadata. It can be a normal table (stored in Metastore) or an external table (stored in local file system); Hive treats both in the same manner, irrespective of their types. Drop Table Statement The syntax is as follows: DROP TABLE [IF EXISTS] table_name; The following query drops a table named employee: hive> DROP TABLE IF EXISTS employee; On successful execution of the query, you get to see the following response: OK Time taken: 5.3 seconds hive> JDBC Program The following JDBC program drops the employee table. import java.sql.SQLException; import java.sql.Connection; import java.sql.ResultSet; import java.sql.Statement; import java.sql.DriverManager; public class HiveDropTable { private static String driverName = “org.apache.hadoop.hive.jdbc.HiveDriver”; public static void main(String[] args) throws SQLException { // Register driver and create driver instance Class.forName(driverName); // get connection Connection con = DriverManager.getConnection(“jdbc:hive://localhost:10000/userdb”, “”, “”); // create statement Statement stmt = con.createStatement(); // execute statement stmt.executeQuery(“DROP TABLE IF EXISTS employee;”); System.out.println(“Drop table successful.”); con.close(); } } Save the program in a file named HiveDropTable.java. Use the following commands to compile and execute this program. $ javac HiveDropTable.java $ java HiveDropTable Output: Drop table successful The following query is used to verify the list of tables: hive> SHOW TABLES; emp ok Time taken: 2.1 seconds hive> Print Page Previous Next Advertisements ”;

Hive – Alter Table

Hive – Alter Table ”; Previous Next This chapter explains how to alter the attributes of a table such as changing its table name, changing column names, adding columns, and deleting or replacing columns. Alter Table Statement It is used to alter a table in Hive. Syntax The statement takes any of the following syntaxes based on what attributes we wish to modify in a table. ALTER TABLE name RENAME TO new_name ALTER TABLE name ADD COLUMNS (col_spec[, col_spec …]) ALTER TABLE name DROP [COLUMN] column_name ALTER TABLE name CHANGE column_name new_name new_type ALTER TABLE name REPLACE COLUMNS (col_spec[, col_spec …]) Rename To… Statement The following query renames the table from employee to emp. hive> ALTER TABLE employee RENAME TO emp; JDBC Program The JDBC program to rename a table is as follows. import java.sql.SQLException; import java.sql.Connection; import java.sql.ResultSet; import java.sql.Statement; import java.sql.DriverManager; public class HiveAlterRenameTo { private static String driverName = “org.apache.hadoop.hive.jdbc.HiveDriver”; public static void main(String[] args) throws SQLException { // Register driver and create driver instance Class.forName(driverName); // get connection Connection con = DriverManager.getConnection(“jdbc:hive://localhost:10000/userdb”, “”, “”); // create statement Statement stmt = con.createStatement(); // execute statement stmt.executeQuery(“ALTER TABLE employee RENAME TO emp;”); System.out.println(“Table Renamed Successfully”); con.close(); } } Save the program in a file named HiveAlterRenameTo.java. Use the following commands to compile and execute this program. $ javac HiveAlterRenameTo.java $ java HiveAlterRenameTo Output: Table renamed successfully. Change Statement The following table contains the fields of employee table and it shows the fields to be changed (in bold). Field Name Convert from Data Type Change Field Name Convert to Data Type eid int eid int name String ename String salary Float salary Double designation String designation String The following queries rename the column name and column data type using the above data: hive> ALTER TABLE employee CHANGE name ename String; hive> ALTER TABLE employee CHANGE salary salary Double; JDBC Program Given below is the JDBC program to change a column. import java.sql.SQLException; import java.sql.Connection; import java.sql.ResultSet; import java.sql.Statement; import java.sql.DriverManager; public class HiveAlterChangeColumn { private static String driverName = “org.apache.hadoop.hive.jdbc.HiveDriver”; public static void main(String[] args) throws SQLException { // Register driver and create driver instance Class.forName(driverName); // get connection Connection con = DriverManager.getConnection(“jdbc:hive://localhost:10000/userdb”, “”, “”); // create statement Statement stmt = con.createStatement(); // execute statement stmt.executeQuery(“ALTER TABLE employee CHANGE name ename String;”); stmt.executeQuery(“ALTER TABLE employee CHANGE salary salary Double;”); System.out.println(“Change column successful.”); con.close(); } } Save the program in a file named HiveAlterChangeColumn.java. Use the following commands to compile and execute this program. $ javac HiveAlterChangeColumn.java $ java HiveAlterChangeColumn Output: Change column successful. Add Columns Statement The following query adds a column named dept to the employee table. hive> ALTER TABLE employee ADD COLUMNS ( dept STRING COMMENT ”Department name”); JDBC Program The JDBC program to add a column to a table is given below. import java.sql.SQLException; import java.sql.Connection; import java.sql.ResultSet; import java.sql.Statement; import java.sql.DriverManager; public class HiveAlterAddColumn { private static String driverName = “org.apache.hadoop.hive.jdbc.HiveDriver”; public static void main(String[] args) throws SQLException { // Register driver and create driver instance Class.forName(driverName); // get connection Connection con = DriverManager.getConnection(“jdbc:hive://localhost:10000/userdb”, “”, “”); // create statement Statement stmt = con.createStatement(); // execute statement stmt.executeQuery(“ALTER TABLE employee ADD COLUMNS ” + ” (dept STRING COMMENT ”Department name”);”); System.out.prinln(“Add column successful.”); con.close(); } } Save the program in a file named HiveAlterAddColumn.java. Use the following commands to compile and execute this program. $ javac HiveAlterAddColumn.java $ java HiveAlterAddColumn Output: Add column successful. Replace Statement The following query deletes all the columns from the employee table and replaces it with emp and name columns: hive> ALTER TABLE employee REPLACE COLUMNS ( eid INT empid Int, ename STRING name String); JDBC Program Given below is the JDBC program to replace eid column with empid and ename column with name. import java.sql.SQLException; import java.sql.Connection; import java.sql.ResultSet; import java.sql.Statement; import java.sql.DriverManager; public class HiveAlterReplaceColumn { private static String driverName = “org.apache.hadoop.hive.jdbc.HiveDriver”; public static void main(String[] args) throws SQLException { // Register driver and create driver instance Class.forName(driverName); // get connection Connection con = DriverManager.getConnection(“jdbc:hive://localhost:10000/userdb”, “”, “”); // create statement Statement stmt = con.createStatement(); // execute statement stmt.executeQuery(“ALTER TABLE employee REPLACE COLUMNS ” +” (eid INT empid Int,” +” ename STRING name String);”); System.out.println(” Replace column successful”); con.close(); } } Save the program in a file named HiveAlterReplaceColumn.java. Use the following commands to compile and execute this program. $ javac HiveAlterReplaceColumn.java $ java HiveAlterReplaceColumn Output: Replace column successful. Print Page Previous Next Advertisements ”;

Hive – Installation

Hive – Installation ”; Previous Next All Hadoop sub-projects such as Hive, Pig, and HBase support Linux operating system. Therefore, you need to install any Linux flavored OS. The following simple steps are executed for Hive installation: Step 1: Verifying JAVA Installation Java must be installed on your system before installing Hive. Let us verify java installation using the following command: $ java –version If Java is already installed on your system, you get to see the following response: java version “1.7.0_71” Java(TM) SE Runtime Environment (build 1.7.0_71-b13) Java HotSpot(TM) Client VM (build 25.0-b02, mixed mode) If java is not installed in your system, then follow the steps given below for installing java. Installing Java Step I: Download java (JDK <latest version> – X64.tar.gz) by visiting the following link http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html. Then jdk-7u71-linux-x64.tar.gz will be downloaded onto your system. Step II: Generally you will find the downloaded java file in the Downloads folder. Verify it and extract the jdk-7u71-linux-x64.gz file using the following commands. $ cd Downloads/ $ ls jdk-7u71-linux-x64.gz $ tar zxf jdk-7u71-linux-x64.gz $ ls jdk1.7.0_71 jdk-7u71-linux-x64.gz Step III: To make java available to all the users, you have to move it to the location “/usr/local/”. Open root, and type the following commands. $ su password: # mv jdk1.7.0_71 /usr/local/ # exit Step IV: For setting up PATH and JAVA_HOME variables, add the following commands to ~/.bashrc file. export JAVA_HOME=/usr/local/jdk1.7.0_71 export PATH=$PATH:$JAVA_HOME/bin Now apply all the changes into the current running system. $ source ~/.bashrc Step V: Use the following commands to configure java alternatives: # alternatives –install /usr/bin/java/java/usr/local/java/bin/java 2 # alternatives –install /usr/bin/javac/javac/usr/local/java/bin/javac 2 # alternatives –install /usr/bin/jar/jar/usr/local/java/bin/jar 2 # alternatives –set java/usr/local/java/bin/java # alternatives –set javac/usr/local/java/bin/javac # alternatives –set jar/usr/local/java/bin/jar Now verify the installation using the command java -version from the terminal as explained above. Step 2: Verifying Hadoop Installation Hadoop must be installed on your system before installing Hive. Let us verify the Hadoop installation using the following command: $ hadoop version If Hadoop is already installed on your system, then you will get the following response: Hadoop 2.4.1 Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768 Compiled by hortonmu on 2013-10-07T06:28Z Compiled with protoc 2.5.0 From source with checksum 79e53ce7994d1628b240f09af91e1af4 If Hadoop is not installed on your system, then proceed with the following steps: Downloading Hadoop Download and extract Hadoop 2.4.1 from Apache Software Foundation using the following commands. $ su password: # cd /usr/local # wget http://apache.claz.org/hadoop/common/hadoop-2.4.1/ hadoop-2.4.1.tar.gz # tar xzf hadoop-2.4.1.tar.gz # mv hadoop-2.4.1/* to hadoop/ # exit Installing Hadoop in Pseudo Distributed Mode The following steps are used to install Hadoop 2.4.1 in pseudo distributed mode. Step I: Setting up Hadoop You can set Hadoop environment variables by appending the following commands to ~/.bashrc file. export HADOOP_HOME=/usr/local/hadoop export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin Now apply all the changes into the current running system. $ source ~/.bashrc Step II: Hadoop Configuration You can find all the Hadoop configuration files in the location “$HADOOP_HOME/etc/hadoop”. You need to make suitable changes in those configuration files according to your Hadoop infrastructure. $ cd $HADOOP_HOME/etc/hadoop In order to develop Hadoop programs using java, you have to reset the java environment variables in hadoop-env.sh file by replacing JAVA_HOME value with the location of java in your system. export JAVA_HOME=/usr/local/jdk1.7.0_71 Given below are the list of files that you have to edit to configure Hadoop. core-site.xml The core-site.xml file contains information such as the port number used for Hadoop instance, memory allocated for the file system, memory limit for storing the data, and the size of Read/Write buffers. Open the core-site.xml and add the following properties in between the <configuration> and </configuration> tags. <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration> hdfs-site.xml The hdfs-site.xml file contains information such as the value of replication data, the namenode path, and the datanode path of your local file systems. It means the place where you want to store the Hadoop infra. Let us assume the following data. dfs.replication (data replication value) = 1 (In the following path /hadoop/ is the user name. hadoopinfra/hdfs/namenode is the directory created by hdfs file system.) namenode path = //home/hadoop/hadoopinfra/hdfs/namenode (hadoopinfra/hdfs/datanode is the directory created by hdfs file system.) datanode path = //home/hadoop/hadoopinfra/hdfs/datanode Open this file and add the following properties in between the <configuration>, </configuration> tags in this file. <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.name.dir</name> <value>file:///home/hadoop/hadoopinfra/hdfs/namenode </value> </property> <property> <name>dfs.data.dir</name> <value>file:///home/hadoop/hadoopinfra/hdfs/datanode </value > </property> </configuration> Note: In the above file, all the property values are user-defined and you can make changes according to your Hadoop infrastructure. yarn-site.xml This file is used to configure yarn into Hadoop. Open the yarn-site.xml file and add the following properties in between the <configuration>, </configuration> tags in this file. <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration> mapred-site.xml This file is used to specify which MapReduce framework we are using. By default, Hadoop contains a template of yarn-site.xml. First of all, you need to copy the file from mapred-site,xml.template to mapred-site.xml file using the following command. $ cp mapred-site.xml.template mapred-site.xml Open mapred-site.xml file and add the following properties in between the <configuration>, </configuration> tags in this file. <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration> Verifying Hadoop Installation The following steps are used to verify the Hadoop installation. Step I: Name Node Setup Set up the namenode using the command “hdfs namenode -format” as follows. $ cd ~ $ hdfs namenode -format The expected result is as follows. 10/24/14 21:30:55 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = localhost/192.168.1.11 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 2.4.1 … … 10/24/14 21:30:56 INFO common.Storage: Storage directory /home/hadoop/hadoopinfra/hdfs/namenode has been successfully formatted. 10/24/14 21:30:56 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0 10/24/14 21:30:56 INFO util.ExitUtil: Exiting with status 0 10/24/14 21:30:56 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at localhost/192.168.1.11 ************************************************************/ Step II: Verifying Hadoop dfs The following command is used to start dfs. Executing this command will start your Hadoop file system. $ start-dfs.sh The expected output is as follows: 10/24/14 21:37:56 Starting namenodes on

Hive – Views And Indexes

Hive – View and Indexes ”; Previous Next This chapter describes how to create and manage views. Views are generated based on user requirements. You can save any result set data as a view. The usage of view in Hive is same as that of the view in SQL. It is a standard RDBMS concept. We can execute all DML operations on a view. Creating a View You can create a view at the time of executing a SELECT statement. The syntax is as follows: CREATE VIEW [IF NOT EXISTS] view_name [(column_name [COMMENT column_comment], …) ] [COMMENT table_comment] AS SELECT … Example Let us take an example for view. Assume employee table as given below, with the fields Id, Name, Salary, Designation, and Dept. Generate a query to retrieve the employee details who earn a salary of more than Rs 30000. We store the result in a view named emp_30000. +——+————–+————-+——————-+——–+ | ID | Name | Salary | Designation | Dept | +——+————–+————-+——————-+——–+ |1201 | Gopal | 45000 | Technical manager | TP | |1202 | Manisha | 45000 | Proofreader | PR | |1203 | Masthanvali | 40000 | Technical writer | TP | |1204 | Krian | 40000 | Hr Admin | HR | |1205 | Kranthi | 30000 | Op Admin | Admin | +——+————–+————-+——————-+——–+ The following query retrieves the employee details using the above scenario: hive> CREATE VIEW emp_30000 AS SELECT * FROM employee WHERE salary>30000; Dropping a View Use the following syntax to drop a view: DROP VIEW view_name The following query drops a view named as emp_30000: hive> DROP VIEW emp_30000; Creating an Index An Index is nothing but a pointer on a particular column of a table. Creating an index means creating a pointer on a particular column of a table. Its syntax is as follows: CREATE INDEX index_name ON TABLE base_table_name (col_name, …) AS ”index.handler.class.name” [WITH DEFERRED REBUILD] [IDXPROPERTIES (property_name=property_value, …)] [IN TABLE index_table_name] [PARTITIONED BY (col_name, …)] [ [ ROW FORMAT …] STORED AS … | STORED BY … ] [LOCATION hdfs_path] [TBLPROPERTIES (…)] Example Let us take an example for index. Use the same employee table that we have used earlier with the fields Id, Name, Salary, Designation, and Dept. Create an index named index_salary on the salary column of the employee table. The following query creates an index: hive> CREATE INDEX inedx_salary ON TABLE employee(salary) AS ”org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler”; It is a pointer to the salary column. If the column is modified, the changes are stored using an index value. Dropping an Index The following syntax is used to drop an index: DROP INDEX <index_name> ON <table_name> The following query drops an index named index_salary: hive> DROP INDEX index_salary ON employee; Print Page Previous Next Advertisements ”;

Hive – Built-In Operators

Hive – Built-in Operators ”; Previous Next This chapter explains the built-in operators of Hive. There are four types of operators in Hive: Relational Operators Arithmetic Operators Logical Operators Complex Operators Relational Operators These operators are used to compare two operands. The following table describes the relational operators available in Hive: Operator Operand Description A = B all primitive types TRUE if expression A is equivalent to expression B otherwise FALSE. A != B all primitive types TRUE if expression A is not equivalent to expression B otherwise FALSE. A < B all primitive types TRUE if expression A is less than expression B otherwise FALSE. A <= B all primitive types TRUE if expression A is less than or equal to expression B otherwise FALSE. A > B all primitive types TRUE if expression A is greater than expression B otherwise FALSE. A >= B all primitive types TRUE if expression A is greater than or equal to expression B otherwise FALSE. A IS NULL all types TRUE if expression A evaluates to NULL otherwise FALSE. A IS NOT NULL all types FALSE if expression A evaluates to NULL otherwise TRUE. A LIKE B Strings TRUE if string pattern A matches to B otherwise FALSE. A RLIKE B Strings NULL if A or B is NULL, TRUE if any substring of A matches the Java regular expression B , otherwise FALSE. A REGEXP B Strings Same as RLIKE. Example Let us assume the employee table is composed of fields named Id, Name, Salary, Designation, and Dept as shown below. Generate a query to retrieve the employee details whose Id is 1205. +—–+————–+——–+—————————+——+ | Id | Name | Salary | Designation | Dept | +—–+————–+————————————+——+ |1201 | Gopal | 45000 | Technical manager | TP | |1202 | Manisha | 45000 | Proofreader | PR | |1203 | Masthanvali | 40000 | Technical writer | TP | |1204 | Krian | 40000 | Hr Admin | HR | |1205 | Kranthi | 30000 | Op Admin | Admin| +—–+————–+——–+—————————+——+ The following query is executed to retrieve the employee details using the above table: hive> SELECT * FROM employee WHERE Id=1205; On successful execution of query, you get to see the following response: +—–+———–+———–+———————————-+ | ID | Name | Salary | Designation | Dept | +—–+—————+——-+———————————-+ |1205 | Kranthi | 30000 | Op Admin | Admin | +—–+———–+———–+———————————-+ The following query is executed to retrieve the employee details whose salary is more than or equal to Rs 40000. hive> SELECT * FROM employee WHERE Salary>=40000; On successful execution of query, you get to see the following response: +—–+————+——–+—————————-+——+ | ID | Name | Salary | Designation | Dept | +—–+————+——–+—————————-+——+ |1201 | Gopal | 45000 | Technical manager | TP | |1202 | Manisha | 45000 | Proofreader | PR | |1203 | Masthanvali| 40000 | Technical writer | TP | |1204 | Krian | 40000 | Hr Admin | HR | +—–+————+——–+—————————-+——+ Arithmetic Operators These operators support various common arithmetic operations on the operands. All of them return number types. The following table describes the arithmetic operators available in Hive: Operators Operand Description A + B all number types Gives the result of adding A and B. A – B all number types Gives the result of subtracting B from A. A * B all number types Gives the result of multiplying A and B. A / B all number types Gives the result of dividing B from A. A % B all number types Gives the reminder resulting from dividing A by B. A & B all number types Gives the result of bitwise AND of A and B. A | B all number types Gives the result of bitwise OR of A and B. A ^ B all number types Gives the result of bitwise XOR of A and B. ~A all number types Gives the result of bitwise NOT of A. Example The following query adds two numbers, 20 and 30. hive> SELECT 20+30 ADD FROM temp; On successful execution of the query, you get to see the following response: +——–+ | ADD | +——–+ | 50 | +——–+ Logical Operators The operators are logical expressions. All of them return either TRUE or FALSE. Operators Operands Description A AND B boolean TRUE if both A and B are TRUE, otherwise FALSE. A && B boolean Same as A AND B. A OR B boolean TRUE if either A or B or both are TRUE, otherwise FALSE. A || B boolean Same as A OR B. NOT A boolean TRUE if A is FALSE, otherwise FALSE. !A boolean Same as NOT A. Example The following query is used to retrieve employee details whose Department is TP and Salary is more than Rs 40000. hive> SELECT * FROM employee WHERE Salary>40000 && Dept=TP; On successful execution of the query, you get to see the following response: +——+————–+————-+——————-+——–+ | ID | Name | Salary | Designation | Dept | +——+————–+————-+——————-+——–+ |1201 | Gopal | 45000 | Technical manager | TP | +——+————–+————-+——————-+——–+ Complex Operators These operators provide an expression to access the elements of Complex Types. Operator Operand Description A[n] A is an Array and n is an int It returns the nth element in the array A. The first element has index 0. M[key] M is a Map<K, V> and key has type K It returns the value corresponding to the key in the map. S.x S is a struct It returns the x field of S. Print Page Previous Next Advertisements ”;

Hive – Partitioning

Hive – Partitioning ”; Previous Next Hive organizes tables into partitions. It is a way of dividing a table into related parts based on the values of partitioned columns such as date, city, and department. Using partition, it is easy to query a portion of the data. Tables or partitions are sub-divided into buckets, to provide extra structure to the data that may be used for more efficient querying. Bucketing works based on the value of hash function of some column of a table. For example, a table named Tab1 contains employee data such as id, name, dept, and yoj (i.e., year of joining). Suppose you need to retrieve the details of all employees who joined in 2012. A query searches the whole table for the required information. However, if you partition the employee data with the year and store it in a separate file, it reduces the query processing time. The following example shows how to partition a file and its data: The following file contains employeedata table. /tab1/employeedata/file1 id, name, dept, yoj 1, gopal, TP, 2012 2, kiran, HR, 2012 3, kaleel,SC, 2013 4, Prasanth, SC, 2013 The above data is partitioned into two files using year. /tab1/employeedata/2012/file2 1, gopal, TP, 2012 2, kiran, HR, 2012 /tab1/employeedata/2013/file3 3, kaleel,SC, 2013 4, Prasanth, SC, 2013 Adding a Partition We can add partitions to a table by altering the table. Let us assume we have a table called employee with fields such as Id, Name, Salary, Designation, Dept, and yoj. Syntax: ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec [LOCATION ”location1”] partition_spec [LOCATION ”location2”] …; partition_spec: : (p_column = p_col_value, p_column = p_col_value, …) The following query is used to add a partition to the employee table. hive> ALTER TABLE employee > ADD PARTITION (year=’2012’) > location ”/2012/part2012”; Renaming a Partition The syntax of this command is as follows. ALTER TABLE table_name PARTITION partition_spec RENAME TO PARTITION partition_spec; The following query is used to rename a partition: hive> ALTER TABLE employee PARTITION (year=’1203’) > RENAME TO PARTITION (Yoj=’1203’); Dropping a Partition The following syntax is used to drop a partition: ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec, PARTITION partition_spec,…; The following query is used to drop a partition: hive> ALTER TABLE employee DROP [IF EXISTS] > PARTITION (year=’1203’); Print Page Previous Next Advertisements ”;

Hive – Data Types

Hive – Data Types ”; Previous Next This chapter takes you through the different data types in Hive, which are involved in the table creation. All the data types in Hive are classified into four types, given as follows: Column Types Literals Null Values Complex Types Column Types Column type are used as column data types of Hive. They are as follows: Integral Types Integer type data can be specified using integral data types, INT. When the data range exceeds the range of INT, you need to use BIGINT and if the data range is smaller than the INT, you use SMALLINT. TINYINT is smaller than SMALLINT. The following table depicts various INT data types: Type Postfix Example TINYINT Y 10Y SMALLINT S 10S INT – 10 BIGINT L 10L String Types String type data types can be specified using single quotes (” ”) or double quotes (” “). It contains two data types: VARCHAR and CHAR. Hive follows C-types escape characters. The following table depicts various CHAR data types: Data Type Length VARCHAR 1 to 65355 CHAR 255 Timestamp It supports traditional UNIX timestamp with optional nanosecond precision. It supports java.sql.Timestamp format “YYYY-MM-DD HH:MM:SS.fffffffff” and format “yyyy-mm-dd hh:mm:ss.ffffffffff”. Dates DATE values are described in year/month/day format in the form {{YYYY-MM-DD}}. Decimals The DECIMAL type in Hive is as same as Big Decimal format of Java. It is used for representing immutable arbitrary precision. The syntax and example is as follows: DECIMAL(precision, scale) decimal(10,0) Union Types Union is a collection of heterogeneous data types. You can create an instance using create union. The syntax and example is as follows: UNIONTYPE<int, double, array<string>, struct<a:int,b:string>> {0:1} {1:2.0} {2:[“three”,”four”]} {3:{“a”:5,”b”:”five”}} {2:[“six”,”seven”]} {3:{“a”:8,”b”:”eight”}} {0:9} {1:10.0} Literals The following literals are used in Hive: Floating Point Types Floating point types are nothing but numbers with decimal points. Generally, this type of data is composed of DOUBLE data type. Decimal Type Decimal type data is nothing but floating point value with higher range than DOUBLE data type. The range of decimal type is approximately -10-308 to 10308. Null Value Missing values are represented by the special value NULL. Complex Types The Hive complex data types are as follows: Arrays Arrays in Hive are used the same way they are used in Java. Syntax: ARRAY<data_type> Maps Maps in Hive are similar to Java Maps. Syntax: MAP<primitive_type, data_type> Structs Structs in Hive is similar to using complex data with comment. Syntax: STRUCT<col_name : data_type [COMMENT col_comment], …> Print Page Previous Next Advertisements ”;

Hive – Home

Hive Tutorial PDF Version Quick Guide Resources Job Search Discussion Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy. This is a brief tutorial that provides an introduction on how to use Apache Hive HiveQL with Hadoop Distributed File System. This tutorial can be your first step towards becoming a successful Hadoop Developer with Hive. Audience This tutorial is prepared for professionals aspiring to make a career in Big Data Analytics using Hadoop Framework. ETL developers and professionals who are into analytics in general may as well use this tutorial to good effect. Prerequisites Before proceeding with this tutorial, you need a basic knowledge of Core Java, Database concepts of SQL, Hadoop File system, and any of Linux operating system flavors. Print Page Previous Next Advertisements ”;