apache Presto Archives - Page 2 of 2 - Donotsad where can learn any thing work project and make money

Aug 10

Apache Presto – JDBC Interface

Apache Presto – JDBC Interface ”; Previous Next Presto’s JDBC interface is used to access Java application. Prerequisites Install presto-jdbc-0.150.jar You can download the JDBC jar file by visiting the following link, https://repo1.maven.org/maven2/com/facebook/presto/presto-jdbc/0.150/ After the jar file has been downloaded, add it to the class path of your Java application. Create a Simple Application Let’s create a simple java application using JDBC interface. Coding − PrestoJdbcSample.java import java.sql.*; import com.facebook.presto.jdbc.PrestoDriver; //import presto jdbc driver packages here. public class PrestoJdbcSample { public static void main(String[] args) { Connection connection = null; Statement statement = null; try { Class.forName(“com.facebook.presto.jdbc.PrestoDriver”); connection = DriverManager.getConnection( “jdbc:presto://localhost:8080/mysql/tutorials”, “tutorials”, “”); //connect mysql server tutorials database here statement = connection.createStatement(); String sql; sql = “select auth_id, auth_name from mysql.tutorials.author”; //select mysql table author table two columns ResultSet resultSet = statement.executeQuery(sql); while(resultSet.next()){ int id = resultSet.getInt(“auth_id”); String name = resultSet.getString(“auth_name”); System.out.print(“ID: ” + id + “;nName: ” + name + “n”); } resultSet.close(); statement.close(); connection.close(); }catch(SQLException sqlException){ sqlException.printStackTrace(); }catch(Exception exception){ exception.printStackTrace(); } } } Save the file and quit the application. Now, start Presto server in one terminal and open a new terminal to compile and execute the result. Following are the steps − Compilation ~/Workspace/presto/presto-jdbc $ javac -cp presto-jdbc-0.149.jar PrestoJdbcSample.java Execution ~/Workspace/presto/presto-jdbc $ java -cp .:presto-jdbc-0.149.jar PrestoJdbcSample Output INFO: Logging initialized @146ms ID: 1; Name: Doug Cutting ID: 2; Name: James Gosling ID: 3; Name: Dennis Ritchie Print Page Previous Next Advertisements ”;

Aug 10

Apache Presto – Installation

Apache Presto – Installation ”; Previous Next This chapter will explain how to install Presto on your machine. Let’s go through the basic requirements of Presto, Linux or Mac OS Java version 8 Now, let’s continue the following steps to install Presto on your machine. Verifying Java installation Hopefully, you have already installed Java version 8 on your machine right now, so you just verify it using the following command. $ java -version If Java is successfully installed on your machine, you could see the version of installed Java. If Java is not installed, follow the subsequent steps to install Java 8 on your machine. Download JDK. Download the latest version of JDK by visiting the following link. http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html The latest version is JDK 8u 92 and the file is “jdk-8u92-linux-x64.tar.gz”. Please download the file on your machine. After that, extract the files and move to the specific directory. Then set Java alternatives. Finally Java will be installed on your machine. Apache Presto Installation Download the latest version of Presto by visiting the following link, https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.149/ Now the latest version of “presto-server-0.149.tar.gz” will be downloaded on your machine. Extract tar Files Extract the tar file using the following command − $ tar -zxf presto-server-0.149.tar.gz $ cd presto-server-0.149 Configuration Settings Create “data” directory Create a data directory outside the installation directory, which will be used for storing logs, metadata, etc., so that it is to be easily preserved when upgrading Presto. It is defined using the following code − $ cd $ mkdir data To view the path where it is located, use the command “pwd”. This location will be assigned in the next node.properties file. Create “etc” directory Create an etc directory inside Presto installation directory using the following code − $ cd presto-server-0.149 $ mkdir etc This directory will hold configuration files. Let’s create each file one by one. Node Properties Presto node properties file contains environmental configuration specific to each node. It is created inside etc directory (etc/node.properties) using the following code − $ cd etc $ vi node.properties node.environment = production node.id = ffffffff-ffff-ffff-ffff-ffffffffffff node.data-dir = /Users/../workspace/Presto After making all the changes, save the file, and quit the terminal. Here node.data is the location path of the above created data directory. node.id represents the unique identifier for each node. JVM Config Create a file “jvm.config” inside etc directory (etc/jvm.config). This file contains a list of command line options used for launching the Java Virtual Machine. $ cd etc $ vi jvm.config -server -Xmx16G -XX:+UseG1GC -XX:G1HeapRegionSize = 32M -XX:+UseGCOverheadLimit -XX:+ExplicitGCInvokesConcurrent -XX:+HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError = kill -9 %p After making all the changes, save the file, and quit the terminal. Config Properties Create a file “config.properties” inside etc directory(etc/config.properties). This file contains the configuration of Presto server. If you are setting up a single machine for testing, Presto server can function only as the coordination process as defined using the following code − $ cd etc $ vi config.properties coordinator = true node-scheduler.include-coordinator = true http-server.http.port = 8080 query.max-memory = 5GB query.max-memory-per-node = 1GB discovery-server.enabled = true discovery.uri = http://localhost:8080 Here, coordinator − master node. node-scheduler.include-coordinator − Allows scheduling work on the coordinator. http-server.http.port − Specifies the port for the HTTP server. query.max-memory=5GB − The maximum amount of distributed memory. query.max-memory-per-node=1GB − The maximum amount of memory per node. discovery-server.enabled − Presto uses the Discovery service to find all the nodes in the cluster. discovery.uri − he URI to the Discovery server. If you are setting up multiple machine Presto server, Presto will function as both coordination and worker process. Use this configuration setting to test Presto server on multiple machines. Configuration for Coordinator $ cd etc $ vi config.properties coordinator = true node-scheduler.include-coordinator = false http-server.http.port = 8080 query.max-memory = 50GB query.max-memory-per-node = 1GB discovery-server.enabled = true discovery.uri = http://localhost:8080 Configuration for Worker $ cd etc $ vi config.properties coordinator = false http-server.http.port = 8080 query.max-memory = 50GB query.max-memory-per-node = 1GB discovery.uri = http://localhost:8080 Log Properties Create a file “log.properties” inside etc directory(etc/log.properties). This file contains minimum log level for named logger hierarchies. It is defined using the following code − $ cd etc $ vi log.properties com.facebook.presto = INFO Save the file and quit the terminal. Here, four log levels are used such as DEBUG, INFO, WARN and ERROR. Default log level is INFO. Catalog Properties Create a directory “catalog” inside etc directory(etc/catalog). This will be used for mounting data. For example, create etc/catalog/jmx.properties with the following contents to mount the jmx connector as the jmx catalog − $ cd etc $ mkdir catalog $ cd catalog $ vi jmx.properties connector.name = jmx Start Presto Presto can be started using the following command, $ bin/launcher start Then you will see the response similar to this, Started as 840 Run Presto To launch Presto server, use the following command − $ bin/launcher run After successfully launching Presto server, you can find log files in “var/log” directory. launcher.log − This log is created by the launcher and is connected to the stdout and stderr streams of the server. server.log − This is the main log file used by Presto. http-request.log − HTTP request received by the server. As of now, you have successfully installed Presto configuration settings on your machine. Let’s continue the steps to install Presto CLI. Install Presto CLI The Presto CLI provides a terminal-based interactive shell for running queries. Download the Presto CLI by visiting the following link, https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.149/ Now “presto-cli-0.149-executable.jar” will be installed on your machine. Run CLI After downloading the presto-cli, copy it to the location which you want to run it from. This location may be any node that has network access to the coordinator. First change the name of the Jar file to Presto. Then make it executable with chmod + x command using the following code − $ mv presto-cli-0.149-executable.jar presto $ chmod +x presto Now execute CLI using the following command, ./presto –server localhost:8080 –catalog jmx –schema default Here jmx(Java Management Extension) refers to catalog and default

Aug 10

Apache Presto – SQL Operations

Apache Presto – Basic SQL Operations ”; Previous Next In this chapter, we will discuss how to create and execute queries on Presto. Let us go through Presto supported basic data types. Basic Data Types The following table describes the basic data types of Presto. S.No Data type & Description 1. VARCHAR Variable length character data 2. BIGINT A 64-bit signed integer 3. DOUBLE A 64-bit floating point double precision value 4. DECIMAL A fixed precision decimal number. For example DECIMAL(10,3) – 10 is precision, i.e. total number of digits and 3 is scale value represented as fractional point. Scale is optional and default value is 0 5. BOOLEAN Boolean values true and false 6. VARBINARY Variable length binary data 7. JSON JSON data 8. DATE Date data type represented as year-month-day 9. TIME, TIMESTAMP, TIMESTAMP with TIME ZONE TIME – Time of the day (hour-min-sec-millisecond) TIMESTAMP – Date and time of the day TIMESTAMP with TIME ZONE – Date and time of the day with time zone from the value 10. INTERVAL Stretch or extend date and time data types 11. ARRAY Array of the given component type. For example, ARRAY[5,7] 12. MAP Map between the given component types. For example, MAP(ARRAY[‘one’,’two’],ARRAY[5,7]) 13. ROW Row structure made up of named fields Presto − Operators Presto operators are listed in the following table. S.No Operator & Description 1. Arithmetic operator Presto supports arithmetic operators such as +, -, &ast;, /, % 2. Relational operator <,>,<=,>=,=,<> 3. Logical operator AND, OR, NOT 4. Range operator Range operator is used to test the value in a specific range. Presto supports BETWEEN, IS NULL, IS NOT NULL, GREATEST and LEAST 5. Decimal operator Binary arithmetic decimal operator performs binary arithmetic operation for decimal type Unary decimal operator − The – operator performs negation 6. String operator The ‘||’ operator performs string concatenation 7. Date and time operator Performs arithmetic addition and subtraction operations on date and time data types 8. Array operator Subscript operator[] – access an element of an array Concatenation operator || – concatenate an array with an array or an element of the same type 9. Map operator Map subscript operator [] – retrieve the value corresponding to a given key from a map Print Page Previous Next Advertisements ”;

Aug 10

Apache Presto – Home

Apache Presto Tutorial PDF Version Quick Guide Resources Job Search Discussion Apache Presto is an open source distributed SQL engine. Presto originated at Facebook for data analytics needs and later was open sourced. Now, Teradata joins Presto community and offers support. Apache Presto is very useful for performing queries even petabytes of data. Extensible architecture and storage plugin interfaces are very easy to interact with other file systems. Most of today’s best industrial companies are adopting Presto for its interactive speeds and low latency performance. This tutorial explores Presto architecture, configuration, and storage plugins. It discusses the basic and advanced queries and finally concludes with real-time examples. Audience This tutorial has been prepared for professionals aspiring to make a career in Big Data Analytics. This tutorial will give you enough understanding on Apache Presto. Prerequisites Before proceeding with this tutorial, you must have a good understanding of Core Java, DBMS and any of the Linux operating systems. Print Page Previous Next Advertisements ”;

Aug 10

Apache Presto – KAFKA Connector

Apache Presto – KAFKA Connector ”; Previous Next The Kafka Connector for Presto allows to access data from Apache Kafka using Presto. Prerequisites Download and install the latest version of the following Apache projects. Apache ZooKeeper Apache Kafka Start ZooKeeper Start ZooKeeper server using the following command. $ bin/zookeeper-server-start.sh config/zookeeper.properties Now, ZooKeeper starts port on 2181. Start Kafka Start Kafka in another terminal using the following command. $ bin/kafka-server-start.sh config/server.properties After kafka starts, it uses the port number 9092. TPCH Data Download tpch-kafka $ curl -o kafka-tpch https://repo1.maven.org/maven2/de/softwareforge/kafka_tpch_0811/1.0/kafka_tpch_ 0811-1.0.sh Now you have downloaded the loader from Maven central using the above command. You will get a similar response as the following. % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 –:–:– 0:00:01 –:–:– 0 5 21.6M 5 1279k 0 0 83898 0 0:04:30 0:00:15 0:04:15 129k 6 21.6M 6 1407k 0 0 86656 0 0:04:21 0:00:16 0:04:05 131k 24 21.6M 24 5439k 0 0 124k 0 0:02:57 0:00:43 0:02:14 175k 24 21.6M 24 5439k 0 0 124k 0 0:02:58 0:00:43 0:02:15 160k 25 21.6M 25 5736k 0 0 128k 0 0:02:52 0:00:44 0:02:08 181k ……………………….. Then, make it executable using the following command, $ chmod 755 kafka-tpch Run tpch-kafka Run the kafka-tpch program to preload a number of topics with tpch data using the following command. Query $ ./kafka-tpch load –brokers localhost:9092 –prefix tpch. –tpch-type tiny Result 2016-07-13T16:15:52.083+0530 INFO main io.airlift.log.Logging Logging to stderr 2016-07-13T16:15:52.124+0530 INFO main de.softwareforge.kafka.LoadCommand Processing tables: [customer, orders, lineitem, part, partsupp, supplier, nation, region] 2016-07-13T16:15:52.834+0530 INFO pool-1-thread-1 de.softwareforge.kafka.LoadCommand Loading table ”customer” into topic ”tpch.customer”… 2016-07-13T16:15:52.834+0530 INFO pool-1-thread-2 de.softwareforge.kafka.LoadCommand Loading table ”orders” into topic ”tpch.orders”… 2016-07-13T16:15:52.834+0530 INFO pool-1-thread-3 de.softwareforge.kafka.LoadCommand Loading table ”lineitem” into topic ”tpch.lineitem”… 2016-07-13T16:15:52.834+0530 INFO pool-1-thread-4 de.softwareforge.kafka.LoadCommand Loading table ”part” into topic ”tpch.part”… ……………………… ………………………. Now, Kafka tables customers,orders,supplier, etc., are loaded using tpch. Add Config Settings Let’s add the following Kafka connector configuration settings on Presto server. connector.name = kafka kafka.nodes = localhost:9092 kafka.table-names = tpch.customer,tpch.orders,tpch.lineitem,tpch.part,tpch.partsupp, tpch.supplier,tpch.nation,tpch.region kafka.hide-internal-columns = false In the above configuration, Kafka tables are loaded using Kafka-tpch program. Start Presto CLI Start Presto CLI using the following command, $ ./presto –server localhost:8080 –catalog kafka —schema tpch; Here “tpch” is a schema for Kafka connector and you will receive a response as the following. presto:tpch> List Tables Following query lists out all the tables in “tpch” schema. Query presto:tpch> show tables; Result Table ———- customer lineitem nation orders part partsupp region supplier Describe Customer Table Following query describes “customer” table. Query presto:tpch> describe customer; Result Column | Type | Comment ——————-+———+——————————————— _partition_id | bigint | Partition Id _partition_offset | bigint | Offset for the message within the partition _segment_start | bigint | Segment start offset _segment_end | bigint | Segment end offset _segment_count | bigint | Running message count per segment _key | varchar | Key text _key_corrupt | boolean | Key data is corrupt _key_length | bigint | Total number of key bytes _message | varchar | Message text _message_corrupt | boolean | Message data is corrupt _message_length | bigint | Total number of message bytes Print Page Previous Next Advertisements ”;

Aug 10

Apache Presto – Useful Resources

Apache Presto – Useful Resources ”; Previous Next The following resources contain additional information on Apache Presto. Please use them to get more in-depth knowledge on this. Useful Video Courses Apache Spark Online Training Course 47 Lectures 3.5 hours Tutorialspoint More Detail Delta Lake with Apache Spark using Scala 53 Lectures 2 hours Bigdata Engineer More Detail Apache Spark with Scala for Certified Databricks Professional 78 Lectures 5.5 hours Bigdata Engineer More Detail Apache Cassandra for Beginners 28 Lectures 2 hours Navdeep Kaur More Detail NGINX, Apache, SSL Encryption – Training Course 60 Lectures 3.5 hours YouAccel More Detail Learn Advanced Apache Kafka from Scratch Featured 154 Lectures 9 hours Learnkart Technology Pvt Ltd More Detail Print Page Previous Next Advertisements ”;

Aug 10

Apache Presto – Overview

Apache Presto – Overview ”; Previous Next Data analytics is the process of analyzing raw data to gather relevant information for better decision making. It is primarily used in many organizations to make business decisions. Well, big data analytics involves a large amount of data and this process is quite complex, hence companies use different strategies. For example, Facebook is one of the leading data driven and largest data warehouse company in the world. Facebook warehouse data is stored in Hadoop for large scale computation. Later, when warehouse data grew to petabytes, they decided to develop a new system with low latency. In the year of 2012, Facebook team members designed “Presto” for interactive query analytics that would operate quickly even with petabytes of data. What is Apache Presto? Apache Presto is a distributed parallel query execution engine, optimized for low latency and interactive query analysis. Presto runs queries easily and scales without down time even from gigabytes to petabytes. A single Presto query can process data from multiple sources like HDFS, MySQL, Cassandra, Hive and many more data sources. Presto is built in Java and easy to integrate with other data infrastructure components. Presto is powerful, and leading companies like Airbnb, DropBox, Groupon, Netflix are adopting it. Presto − Features Presto contains the following features − Simple and extensible architecture. Pluggable connectors – Presto supports pluggable connector to provide metadata and data for queries. Pipelined executions – Avoids unnecessary I/O latency overhead. User-defined functions – Analysts can create custom user-defined functions to migrate easily. Vectorized columnar processing. Presto − Benefits Here is a list of benefits that Apache Presto offers − Specialized SQL operations Easy to install and debug Simple storage abstraction Quickly scales petabytes data with low latency Presto − Applications Presto supports most of today’s best industrial applications. Let’s take a look at some of the notable applications. Facebook − Facebook built Presto for data analytics needs. Presto easily scales large velocity of data. Teradata − Teradata provides end-to-end solutions in Big Data analytics and data warehousing. Teradata contribution to Presto makes it easier for more companies to enable all analytical needs. Airbnb − Presto is an integral part of the Airbnb data infrastructure. Well, hundreds of employees are running queries each day with the technology. Why Presto? Presto supports standard ANSI SQL which has made it very easy for data analysts and developers. Though it is built in Java, it avoids typical issues of Java code related to memory allocation and garbage collection. Presto has a connector architecture that is Hadoop friendly. It allows to easily plug in file systems. Presto runs on multiple Hadoop distributions. In addition, Presto can reach out from a Hadoop platform to query Cassandra, relational databases, or other data stores. This cross-platform analytic capability allows Presto users to extract maximum business value from gigabytes to petabytes of data. Print Page Previous Next Advertisements ”;