Sqoop – Questions and Answers

Sqoop Questions and Answers ”; Previous Next Sqoop Questions and Answers has been designed with a special intention of helping students and professionals preparing for various Certification Exams and Job Interviews. This section provides a useful collection of sample Interview Questions and Multiple Choice Questions (MCQs) and their answers with appropriate explanations. SN Question/Answers Type 1 Sqoop Interview Questions This section provides a huge collection of Sqoop Interview Questions with their answers hidden in a box to challenge you to have a go at them before discovering the correct answer. 2 Sqoop Online Quiz This section provides a great collection of Sqoop Multiple Choice Questions (MCQs) on a single page along with their correct answers and explanation. If you select the right option, it turns green; else red. 3 Sqoop Online Test If you are preparing to appear for a Java and Sqoop related certification exam, then this section is a must for you. This section simulates a real online test along with a given timer which challenges you to complete the test within a given time-frame. Finally you can check your overall test score and how you fared among millions of other candidates who attended this online test. 4 Sqoop Mock Test This section provides various mock tests that you can download at your local machine and solve offline. Every mock test is supplied with a mock test key to let you verify the final score and grade yourself. Print Page Previous Next Advertisements ”;

Sqoop – Useful Resources

Sqoop – Useful Resources ”; Previous Next The following resources contain additional information on Sqoop. Please use them to get more in-depth knowledge on this topic. Useful Video Courses Big Data Analytics Using Hive In Hadoop 21 Lectures 2 hours Mukund Kumar Mishra More Detail Advance Big Data Analytics using Hive & Sqoop Best Seller 51 Lectures 4 hours Navdeep Kaur More Detail Big Data Hadoop Course Best Seller 90 Lectures 11.5 hours TELCOMA Global More Detail Learn Big Data Hadoop: Hands-On for Beginner 256 Lectures 13.5 hours Bigdata Engineer More Detail Big Data Crash Course 68 Lectures 9 hours DataCouch More Detail Big Data For Architects 61 Lectures 7.5 hours DataCouch More Detail Print Page Previous Next Advertisements ”;

Sqoop – Home

Sqoop Tutorial PDF Version Quick Guide Resources Job Search Discussion Sqoop is a tool designed to transfer data between Hadoop and relational database servers. It is used to import data from relational databases such as MySQL, Oracle to Hadoop HDFS, and export from Hadoop file system to relational databases. This is a brief tutorial that explains how to make use of Sqoop in Hadoop ecosystem. Audience This tutorial is prepared for professionals aspiring to make a career in Big Data Analytics using Hadoop Framework with Sqoop. ETL developers and professionals who are into analytics in general may as well use this tutorial to good effect. Prerequisites Before proceeding with this tutorial, you need a basic knowledge of Core Java, Database concepts of SQL, Hadoop File system, and any of Linux operating system flavors. Print Page Previous Next Advertisements ”;

Sqoop – List Databases

Sqoop – List Databases ”; Previous Next This chapter describes how to list out the databases using Sqoop. Sqoop list-databases tool parses and executes the ‘SHOW DATABASES’ query against the database server. Thereafter, it lists out the present databases on the server. Syntax The following syntax is used for Sqoop list-databases command. $ sqoop list-databases (generic-args) (list-databases-args) $ sqoop-list-databases (generic-args) (list-databases-args) Sample Query The following command is used to list all the databases in the MySQL database server. $ sqoop list-databases –connect jdbc:mysql://localhost/ –username root If the command executes successfully, then it will display the list of databases in your MySQL database server as follows. … 13/05/31 16:45:58 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. mysql test userdb db Print Page Previous Next Advertisements ”;

Sqoop – Import-All-Tables

Sqoop – Import All Tables ”; Previous Next This chapter describes how to import all the tables from the RDBMS database server to the HDFS. Each table data is stored in a separate directory and the directory name is same as the table name. Syntax The following syntax is used to import all tables. $ sqoop import-all-tables (generic-args) (import-args) $ sqoop-import-all-tables (generic-args) (import-args) Example Let us take an example of importing all tables from the userdb database. The list of tables that the database userdb contains is as follows. +——————–+ | Tables | +——————–+ | emp | | emp_add | | emp_contact | +——————–+ The following command is used to import all the tables from the userdb database. $ sqoop import-all-tables –connect jdbc:mysql://localhost/userdb –username root Note − If you are using the import-all-tables, it is mandatory that every table in that database must have a primary key field. The following command is used to verify all the table data to the userdb database in HDFS. $ $HADOOP_HOME/bin/hadoop fs -ls It will show you the list of table names in userdb database as directories. Output drwxr-xr-x – hadoop supergroup 0 2014-12-22 22:50 _sqoop drwxr-xr-x – hadoop supergroup 0 2014-12-23 01:46 emp drwxr-xr-x – hadoop supergroup 0 2014-12-23 01:50 emp_add drwxr-xr-x – hadoop supergroup 0 2014-12-23 01:52 emp_contact Print Page Previous Next Advertisements ”;

Sqoop – Discussion

Discuss Sqoop ”; Previous Next Sqoop is a tool designed to transfer data between Hadoop and relational database servers. It is used to import data from relational databases such as MySQL, Oracle to Hadoop HDFS, and export from Hadoop file system to relational databases. This is a brief tutorial that explains how to make use of Sqoop in Hadoop ecosystem. Print Page Previous Next Advertisements ”;

Sqoop – Quick Guide

Sqoop – Quick Guide ”; Previous Next Sqoop – Introduction The traditional application management system, that is, the interaction of applications with relational database using RDBMS, is one of the sources that generate Big Data. Such Big Data, generated by RDBMS, is stored in Relational Database Servers in the relational database structure. When Big Data storages and analyzers such as MapReduce, Hive, HBase, Cassandra, Pig, etc. of the Hadoop ecosystem came into picture, they required a tool to interact with the relational database servers for importing and exporting the Big Data residing in them. Here, Sqoop occupies a place in the Hadoop ecosystem to provide feasible interaction between relational database server and Hadoop’s HDFS. Sqoop − “SQL to Hadoop and Hadoop to SQL” Sqoop is a tool designed to transfer data between Hadoop and relational database servers. It is used to import data from relational databases such as MySQL, Oracle to Hadoop HDFS, and export from Hadoop file system to relational databases. It is provided by the Apache Software Foundation. How Sqoop Works? The following image describes the workflow of Sqoop. Sqoop Import The import tool imports individual tables from RDBMS to HDFS. Each row in a table is treated as a record in HDFS. All records are stored as text data in text files or as binary data in Avro and Sequence files. Sqoop Export The export tool exports a set of files from HDFS back to an RDBMS. The files given as input to Sqoop contain records, which are called as rows in table. Those are read and parsed into a set of records and delimited with user-specified delimiter. Sqoop – Installation As Sqoop is a sub-project of Hadoop, it can only work on Linux operating system. Follow the steps given below to install Sqoop on your system. Step 1: Verifying JAVA Installation You need to have Java installed on your system before installing Sqoop. Let us verify Java installation using the following command − $ java –version If Java is already installed on your system, you get to see the following response − java version “1.7.0_71” Java(TM) SE Runtime Environment (build 1.7.0_71-b13) Java HotSpot(TM) Client VM (build 25.0-b02, mixed mode) If Java is not installed on your system, then follow the steps given below. Installing Java Follow the simple steps given below to install Java on your system. Step 1 Download Java (JDK <latest version> – X64.tar.gz) by visiting the following link. Then jdk-7u71-linux-x64.tar.gz will be downloaded onto your system. Step 2 Generally, you can find the downloaded Java file in the Downloads folder. Verify it and extract the jdk-7u71-linux-x64.gz file using the following commands. $ cd Downloads/ $ ls jdk-7u71-linux-x64.gz $ tar zxf jdk-7u71-linux-x64.gz $ ls jdk1.7.0_71 jdk-7u71-linux-x64.gz Step 3 To make Java available to all the users, you have to move it to the location “/usr/local/”. Open root, and type the following commands. $ su password: # mv jdk1.7.0_71 /usr/local/java # exitStep IV: Step 4 For setting up PATH and JAVA_HOME variables, add the following commands to ~/.bashrc file. export JAVA_HOME=/usr/local/java export PATH=$PATH:$JAVA_HOME/bin Now apply all the changes into the current running system. $ source ~/.bashrc Step 5 Use the following commands to configure Java alternatives − # alternatives –install /usr/bin/java java usr/local/java/bin/java 2 # alternatives –install /usr/bin/javac javac usr/local/java/bin/javac 2 # alternatives –install /usr/bin/jar jar usr/local/java/bin/jar 2 # alternatives –set java usr/local/java/bin/java # alternatives –set javac usr/local/java/bin/javac # alternatives –set jar usr/local/java/bin/jar Now verify the installation using the command java -version from the terminal as explained above. Step 2: Verifying Hadoop Installation Hadoop must be installed on your system before installing Sqoop. Let us verify the Hadoop installation using the following command − $ hadoop version If Hadoop is already installed on your system, then you will get the following response − Hadoop 2.4.1 — Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768 Compiled by hortonmu on 2013-10-07T06:28Z Compiled with protoc 2.5.0 From source with checksum 79e53ce7994d1628b240f09af91e1af4 If Hadoop is not installed on your system, then proceed with the following steps − Downloading Hadoop Download and extract Hadoop 2.4.1 from Apache Software Foundation using the following commands. $ su password: # cd /usr/local # wget http://apache.claz.org/hadoop/common/hadoop-2.4.1/ hadoop-2.4.1.tar.gz # tar xzf hadoop-2.4.1.tar.gz # mv hadoop-2.4.1/* to hadoop/ # exit Installing Hadoop in Pseudo Distributed Mode Follow the steps given below to install Hadoop 2.4.1 in pseudo-distributed mode. Step 1: Setting up Hadoop You can set Hadoop environment variables by appending the following commands to ~/.bashrc file. export HADOOP_HOME=/usr/local/hadoop export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin Now, apply all the changes into the current running system. $ source ~/.bashrc Step 2: Hadoop Configuration You can find all the Hadoop configuration files in the location “$HADOOP_HOME/etc/hadoop”. You need to make suitable changes in those configuration files according to your Hadoop infrastructure. $ cd $HADOOP_HOME/etc/hadoop In order to develop Hadoop programs using java, you have to reset the java environment variables in hadoop-env.sh file by replacing JAVA_HOME value with the location of java in your system. export JAVA_HOME=/usr/local/java Given below is the list of files that you need to edit to configure Hadoop. core-site.xml The core-site.xml file contains information such as the port number used for Hadoop instance, memory allocated for the file system, memory limit for storing the data, and the size of Read/Write buffers. Open the core-site.xml and add the following properties in between the <configuration> and </configuration> tags. <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000 </value> </property> </configuration> hdfs-site.xml The hdfs-site.xml file contains information such as the value of replication data, namenode path, and datanode path of your local file systems. It means the place where you want to store the Hadoop infrastructure. Let us assume the following data. dfs.replication (data replication value) = 1 (In the following path /hadoop/ is the user name. hadoopinfra/hdfs/namenode is the directory created by hdfs file system.) namenode path = //home/hadoop/hadoopinfra/hdfs/namenode (hadoopinfra/hdfs/datanode is the directory created by hdfs file system.) datanode path = //home/hadoop/hadoopinfra/hdfs/datanode Open this file and add the following properties in between the <configuration>, </configuration>

Sqoop – Introduction

Sqoop – Introduction ”; Previous Next The traditional application management system, that is, the interaction of applications with relational database using RDBMS, is one of the sources that generate Big Data. Such Big Data, generated by RDBMS, is stored in Relational Database Servers in the relational database structure. When Big Data storages and analyzers such as MapReduce, Hive, HBase, Cassandra, Pig, etc. of the Hadoop ecosystem came into picture, they required a tool to interact with the relational database servers for importing and exporting the Big Data residing in them. Here, Sqoop occupies a place in the Hadoop ecosystem to provide feasible interaction between relational database server and Hadoop’s HDFS. Sqoop − “SQL to Hadoop and Hadoop to SQL” Sqoop is a tool designed to transfer data between Hadoop and relational database servers. It is used to import data from relational databases such as MySQL, Oracle to Hadoop HDFS, and export from Hadoop file system to relational databases. It is provided by the Apache Software Foundation. How Sqoop Works? The following image describes the workflow of Sqoop. Sqoop Import The import tool imports individual tables from RDBMS to HDFS. Each row in a table is treated as a record in HDFS. All records are stored as text data in text files or as binary data in Avro and Sequence files. Sqoop Export The export tool exports a set of files from HDFS back to an RDBMS. The files given as input to Sqoop contain records, which are called as rows in table. Those are read and parsed into a set of records and delimited with user-specified delimiter. Print Page Previous Next Advertisements ”;

Sqoop – Sqoop Job

Sqoop – Job ”; Previous Next This chapter describes how to create and maintain the Sqoop jobs. Sqoop job creates and saves the import and export commands. It specifies parameters to identify and recall the saved job. This re-calling or re-executing is used in the incremental import, which can import the updated rows from RDBMS table to HDFS. Syntax The following is the syntax for creating a Sqoop job. $ sqoop job (generic-args) (job-args) [– [subtool-name] (subtool-args)] $ sqoop-job (generic-args) (job-args) [– [subtool-name] (subtool-args)] Create Job (–create) Here we are creating a job with the name myjob, which can import the table data from RDBMS table to HDFS. The following command is used to create a job that is importing data from the employee table in the db database to the HDFS file. $ sqoop job –create myjob — import –connect jdbc:mysql://localhost/db –username root –table employee –m 1 Verify Job (–list) ‘–list’ argument is used to verify the saved jobs. The following command is used to verify the list of saved Sqoop jobs. $ sqoop job –list It shows the list of saved jobs. Available jobs: myjob Inspect Job (–show) ‘–show’ argument is used to inspect or verify particular jobs and their details. The following command and sample output is used to verify a job called myjob. $ sqoop job –show myjob It shows the tools and their options, which are used in myjob. Job: myjob Tool: import Options: —————————- direct.import = true codegen.input.delimiters.record = 0 hdfs.append.dir = false db.table = employee … incremental.last.value = 1206 … Execute Job (–exec) ‘–exec’ option is used to execute a saved job. The following command is used to execute a saved job called myjob. $ sqoop job –exec myjob It shows you the following output. 10/08/19 13:08:45 INFO tool.CodeGenTool: Beginning code generation … Print Page Previous Next Advertisements ”;

Sqoop – List Tables

Sqoop – List Tables ”; Previous Next This chapter describes how to list out the tables of a particular database in MySQL database server using Sqoop. Sqoop list-tables tool parses and executes the ‘SHOW TABLES’ query against a particular database. Thereafter, it lists out the present tables in a database. Syntax The following syntax is used for Sqoop list-tables command. $ sqoop list-tables (generic-args) (list-tables-args) $ sqoop-list-tables (generic-args) (list-tables-args) Sample Query The following command is used to list all the tables in the userdb database of MySQL database server. $ sqoop list-tables –connect jdbc:mysql://localhost/userdb –username root If the command is executes successfully, then it will display the list of tables in the userdb database as follows. … 13/05/31 16:45:58 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. emp emp_add emp_contact Print Page Previous Next Advertisements ”;