HBase – Create Data

HBase – Create Data ”; Previous Next Inserting Data using HBase Shell This chapter demonstrates how to create data in an HBase table. To create data in an HBase table, the following commands and methods are used: put command, add() method of Put class, and put() method of HTable class. As an example, we are going to create the following table in HBase. Using put command, you can insert rows into a table. Its syntax is as follows: put ’<table name>’,’row1’,’<colfamily:colname>’,’<value>’ Inserting the First Row Let us insert the first row values into the emp table as shown below. hbase(main):005:0> put ”emp”,”1”,”personal data:name”,”raju” 0 row(s) in 0.6600 seconds hbase(main):006:0> put ”emp”,”1”,”personal data:city”,”hyderabad” 0 row(s) in 0.0410 seconds hbase(main):007:0> put ”emp”,”1”,”professional data:designation”,”manager” 0 row(s) in 0.0240 seconds hbase(main):007:0> put ”emp”,”1”,”professional data:salary”,”50000” 0 row(s) in 0.0240 seconds Insert the remaining rows using the put command in the same way. If you insert the whole table, you will get the following output. hbase(main):022:0> scan ”emp” ROW COLUMN+CELL 1 column=personal data:city, timestamp=1417524216501, value=hyderabad 1 column=personal data:name, timestamp=1417524185058, value=ramu 1 column=professional data:designation, timestamp=1417524232601, value=manager 1 column=professional data:salary, timestamp=1417524244109, value=50000 2 column=personal data:city, timestamp=1417524574905, value=chennai 2 column=personal data:name, timestamp=1417524556125, value=ravi 2 column=professional data:designation, timestamp=1417524592204, value=sr:engg 2 column=professional data:salary, timestamp=1417524604221, value=30000 3 column=personal data:city, timestamp=1417524681780, value=delhi 3 column=personal data:name, timestamp=1417524672067, value=rajesh 3 column=professional data:designation, timestamp=1417524693187, value=jr:engg 3 column=professional data:salary, timestamp=1417524702514, value=25000 Inserting Data Using Java API You can insert data into Hbase using the add() method of the Put class. You can save it using the put() method of the HTable class. These classes belong to the org.apache.hadoop.hbase.client package. Below given are the steps to create data in a Table of HBase. Step 1:Instantiate the Configuration Class The Configuration class adds HBase configuration files to its object. You can create a configuration object using the create() method of the HbaseConfiguration class as shown below. Configuration conf = HbaseConfiguration.create(); Step 2:Instantiate the HTable Class You have a class called HTable, an implementation of Table in HBase. This class is used to communicate with a single HBase table. While instantiating this class, it accepts configuration object and table name as parameters. You can instantiate HTable class as shown below. HTable hTable = new HTable(conf, tableName); Step 3: Instantiate the PutClass To insert data into an HBase table, the add() method and its variants are used. This method belongs to Put, therefore instantiate the put class. This class requires the row name you want to insert the data into, in string format. You can instantiate the Put class as shown below. Put p = new Put(Bytes.toBytes(“row1”)); Step 4: Insert Data The add() method of Put class is used to insert data. It requires 3 byte arrays representing column family, column qualifier (column name), and the value to be inserted, respectively. Insert data into the HBase table using the add() method as shown below. p.add(Bytes.toBytes(“coloumn family “), Bytes.toBytes(“column name”),Bytes.toBytes(“value”)); Step 5: Save the Data in Table After inserting the required rows, save the changes by adding the put instance to the put() method of HTable class as shown below. hTable.put(p); Step 6: Close the HTable Instance After creating data in the HBase Table, close the HTable instance using the close() method as shown below. hTable.close(); Given below is the complete program to create data in HBase Table. import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.util.Bytes; public class InsertData{ public static void main(String[] args) throws IOException { // Instantiating Configuration class Configuration config = HBaseConfiguration.create(); // Instantiating HTable class HTable hTable = new HTable(config, “emp”); // Instantiating Put class // accepts a row name. Put p = new Put(Bytes.toBytes(“row1”)); // adding values using add() method // accepts column family name, qualifier/row name ,value p.add(Bytes.toBytes(“personal”), Bytes.toBytes(“name”),Bytes.toBytes(“raju”)); p.add(Bytes.toBytes(“personal”), Bytes.toBytes(“city”),Bytes.toBytes(“hyderabad”)); p.add(Bytes.toBytes(“professional”),Bytes.toBytes(“designation”), Bytes.toBytes(“manager”)); p.add(Bytes.toBytes(“professional”),Bytes.toBytes(“salary”), Bytes.toBytes(“50000”)); // Saving the put Instance to the HTable. hTable.put(p); System.out.println(“data inserted”); // closing HTable hTable.close(); } } Compile and execute the above program as shown below. $javac InsertData.java $java InsertData The following should be the output: data inserted Print Page Previous Next Advertisements ”;

HBase – Overview

HBase – Overview ”; Previous Next Since 1970, RDBMS is the solution for data storage and maintenance related problems. After the advent of big data, companies realized the benefit of processing big data and started opting for solutions like Hadoop. Hadoop uses distributed file system for storing big data, and MapReduce to process it. Hadoop excels in storing and processing of huge data of various formats such as arbitrary, semi-, or even unstructured. Limitations of Hadoop Hadoop can perform only batch processing, and data will be accessed only in a sequential manner. That means one has to search the entire dataset even for the simplest of jobs. A huge dataset when processed results in another huge data set, which should also be processed sequentially. At this point, a new solution is needed to access any point of data in a single unit of time (random access). Hadoop Random Access Databases Applications such as HBase, Cassandra, couchDB, Dynamo, and MongoDB are some of the databases that store huge amounts of data and access the data in a random manner. What is HBase? HBase is a distributed column-oriented database built on top of the Hadoop file system. It is an open-source project and is horizontally scalable. HBase is a data model that is similar to Google’s big table designed to provide quick random access to huge amounts of structured data. It leverages the fault tolerance provided by the Hadoop File System (HDFS). It is a part of the Hadoop ecosystem that provides random real-time read/write access to data in the Hadoop File System. One can store the data in HDFS either directly or through HBase. Data consumer reads/accesses the data in HDFS randomly using HBase. HBase sits on top of the Hadoop File System and provides read and write access. HBase and HDFS HDFS HBase HDFS is a distributed file system suitable for storing large files. HBase is a database built on top of the HDFS. HDFS does not support fast individual record lookups. HBase provides fast lookups for larger tables. It provides high latency batch processing; no concept of batch processing. It provides low latency access to single rows from billions of records (Random access). It provides only sequential access of data. HBase internally uses Hash tables and provides random access, and it stores the data in indexed HDFS files for faster lookups. Storage Mechanism in HBase HBase is a column-oriented database and the tables in it are sorted by row. The table schema defines only column families, which are the key value pairs. A table have multiple column families and each column family can have any number of columns. Subsequent column values are stored contiguously on the disk. Each cell value of the table has a timestamp. In short, in an HBase: Table is a collection of rows. Row is a collection of column families. Column family is a collection of columns. Column is a collection of key value pairs. Given below is an example schema of table in HBase. Rowid Column Family Column Family Column Family Column Family col1 col2 col3 col1 col2 col3 col1 col2 col3 col1 col2 col3 1 2 3 Column Oriented and Row Oriented Column-oriented databases are those that store data tables as sections of columns of data, rather than as rows of data. Shortly, they will have column families. Row-Oriented Database Column-Oriented Database It is suitable for Online Transaction Process (OLTP). It is suitable for Online Analytical Processing (OLAP). Such databases are designed for small number of rows and columns. Column-oriented databases are designed for huge tables. The following image shows column families in a column-oriented database: HBase and RDBMS HBase RDBMS HBase is schema-less, it doesn”t have the concept of fixed columns schema; defines only column families. An RDBMS is governed by its schema, which describes the whole structure of tables. It is built for wide tables. HBase is horizontally scalable. It is thin and built for small tables. Hard to scale. No transactions are there in HBase. RDBMS is transactional. It has de-normalized data. It will have normalized data. It is good for semi-structured as well as structured data. It is good for structured data. Features of HBase HBase is linearly scalable. It has automatic failure support. It provides consistent read and writes. It integrates with Hadoop, both as a source and a destination. It has easy java API for client. It provides data replication across clusters. Where to Use HBase Apache HBase is used to have random, real-time read/write access to Big Data. It hosts very large tables on top of clusters of commodity hardware. Apache HBase is a non-relational database modeled after Google”s Bigtable. Bigtable acts up on Google File System, likewise Apache HBase works on top of Hadoop and HDFS. Applications of HBase It is used whenever there is a need to write heavy applications. HBase is used whenever we need to provide fast random access to available data. Companies such as Facebook, Twitter, Yahoo, and Adobe use HBase internally. HBase History Year Event Nov 2006 Google released the paper on BigTable. Feb 2007 Initial HBase prototype was created as a Hadoop contribution. Oct 2007 The first usable HBase along with Hadoop 0.15.0 was released. Jan 2008 HBase became the sub project of Hadoop. Oct 2008 HBase 0.18.1 was released. Jan 2009 HBase 0.19.0 was released. Sept 2009 HBase 0.20.0 was released. May 2010 HBase became Apache top-level project. Print Page Previous Next Advertisements ”;

HBase – Shell

HBase – Shell ”; Previous Next This chapter explains how to start HBase interactive shell that comes along with HBase. HBase Shell HBase contains a shell using which you can communicate with HBase. HBase uses the Hadoop File System to store its data. It will have a master server and region servers. The data storage will be in the form of regions (tables). These regions will be split up and stored in region servers. The master server manages these region servers and all these tasks take place on HDFS. Given below are some of the commands supported by HBase Shell. General Commands status – Provides the status of HBase, for example, the number of servers. version – Provides the version of HBase being used. table_help – Provides help for table-reference commands. whoami – Provides information about the user. Data Definition Language These are the commands that operate on the tables in HBase. create – Creates a table. list – Lists all the tables in HBase. disable – Disables a table. is_disabled – Verifies whether a table is disabled. enable – Enables a table. is_enabled – Verifies whether a table is enabled. describe – Provides the description of a table. alter – Alters a table. exists – Verifies whether a table exists. drop – Drops a table from HBase. drop_all – Drops the tables matching the ‘regex’ given in the command. Java Admin API – Prior to all the above commands, Java provides an Admin API to achieve DDL functionalities through programming. Under org.apache.hadoop.hbase.client package, HBaseAdmin and HTableDescriptor are the two important classes in this package that provide DDL functionalities. Data Manipulation Language put – Puts a cell value at a specified column in a specified row in a particular table. get – Fetches the contents of row or a cell. delete – Deletes a cell value in a table. deleteall – Deletes all the cells in a given row. scan – Scans and returns the table data. count – Counts and returns the number of rows in a table. truncate – Disables, drops, and recreates a specified table. Java client API – Prior to all the above commands, Java provides a client API to achieve DML functionalities, CRUD (Create Retrieve Update Delete) operations and more through programming, under org.apache.hadoop.hbase.client package. HTable Put and Get are the important classes in this package. Starting HBase Shell To access the HBase shell, you have to navigate to the HBase home folder. cd /usr/localhost/ cd Hbase You can start the HBase interactive shell using “hbase shell” command as shown below. ./bin/hbase shell If you have successfully installed HBase in your system, then it gives you the HBase shell prompt as shown below. HBase Shell; enter ”help<RETURN>” for list of supported commands. Type “exit<RETURN>” to leave the HBase Shell Version 0.94.23, rf42302b28aceaab773b15f234aa8718fff7eea3c, Wed Aug 27 00:54:09 UTC 2014 hbase(main):001:0> To exit the interactive shell command at any moment, type exit or use <ctrl+c>. Check the shell functioning before proceeding further. Use the list command for this purpose. List is a command used to get the list of all the tables in HBase. First of all, verify the installation and the configuration of HBase in your system using this command as shown below. hbase(main):001:0> list When you type this command, it gives you the following output. hbase(main):001:0> list TABLE Print Page Previous Next Advertisements ”;