Apache Presto – Overview ”; Previous Next Data analytics is the process of analyzing raw data to gather relevant information for better decision making. It is primarily used in many organizations to make business decisions. Well, big data analytics involves a large amount of data and this process is quite complex, hence companies use different strategies. For example, Facebook is one of the leading data driven and largest data warehouse company in the world. Facebook warehouse data is stored in Hadoop for large scale computation. Later, when warehouse data grew to petabytes, they decided to develop a new system with low latency. In the year of 2012, Facebook team members designed “Presto” for interactive query analytics that would operate quickly even with petabytes of data. What is Apache Presto? Apache Presto is a distributed parallel query execution engine, optimized for low latency and interactive query analysis. Presto runs queries easily and scales without down time even from gigabytes to petabytes. A single Presto query can process data from multiple sources like HDFS, MySQL, Cassandra, Hive and many more data sources. Presto is built in Java and easy to integrate with other data infrastructure components. Presto is powerful, and leading companies like Airbnb, DropBox, Groupon, Netflix are adopting it. Presto − Features Presto contains the following features − Simple and extensible architecture. Pluggable connectors – Presto supports pluggable connector to provide metadata and data for queries. Pipelined executions – Avoids unnecessary I/O latency overhead. User-defined functions – Analysts can create custom user-defined functions to migrate easily. Vectorized columnar processing. Presto − Benefits Here is a list of benefits that Apache Presto offers − Specialized SQL operations Easy to install and debug Simple storage abstraction Quickly scales petabytes data with low latency Presto − Applications Presto supports most of today’s best industrial applications. Let’s take a look at some of the notable applications. Facebook − Facebook built Presto for data analytics needs. Presto easily scales large velocity of data. Teradata − Teradata provides end-to-end solutions in Big Data analytics and data warehousing. Teradata contribution to Presto makes it easier for more companies to enable all analytical needs. Airbnb − Presto is an integral part of the Airbnb data infrastructure. Well, hundreds of employees are running queries each day with the technology. Why Presto? Presto supports standard ANSI SQL which has made it very easy for data analysts and developers. Though it is built in Java, it avoids typical issues of Java code related to memory allocation and garbage collection. Presto has a connector architecture that is Hadoop friendly. It allows to easily plug in file systems. Presto runs on multiple Hadoop distributions. In addition, Presto can reach out from a Hadoop platform to query Cassandra, relational databases, or other data stores. This cross-platform analytic capability allows Presto users to extract maximum business value from gigabytes to petabytes of data. Print Page Previous Next Advertisements ”;
Category: apache Presto
Apache Presto – Administration Tools ”; Previous Next In this chapter, we will discuss the administration tools used in Presto. Let’s start with the Web Interface of Presto. Web Interface Presto provides a web interface for monitoring and managing queries. It can be accessed from the port number specified in the coordinator Config Properties. Start Presto server and Presto CLI. Then you can access the web interface from the following url − http://localhost:8080/ The output will be similar to the above screen. Here, the main page has a list of queries along with information like unique query ID, query text, query state, percentage completed, username and source from which this query is originated. Latest queries are running first, then completed or not completed queries are displayed at the bottom. Tuning the Performance on Presto If Presto cluster is having any performance-related issues, change your default configuration settings to the following settings. Config Properties task. info -refresh-max-wait − Reduces coordinator work load. task.max-worker-threads − Splits the process and assigns to each worker nodes. distributed-joins-enabled − Hash-based distributed joins. node-scheduler.network-topology − Sets network topology to scheduler. JVM Settings Change your default JVM settings to the following settings. This will be helpful for diagnosing garbage collection issues. -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCCause -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+PrintReferenceGC -XX:+PrintClassHistogramAfterFullGC -XX:+PrintClassHistogramBeforeFullGC -XX:PrintFLSStatistics = 2 -XX:+PrintAdaptiveSizePolicy -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount = 1 Print Page Previous Next Advertisements ”;
Apache Presto – Configuration Settings ”; Previous Next This chapter will discuss the configuration settings for Presto. Presto Verifier The Presto Verifier can be used to test Presto against another database (such as MySQL), or to test two Presto clusters against each other. Create Database in MySQL Open MySQL server and create a database using the following command. create database test Now you have created “test” database in the server. Create the table and load it with the following query. CREATE TABLE verifier_queries( id INT NOT NULL AUTO_INCREMENT, suite VARCHAR(256) NOT NULL, name VARCHAR(256), test_catalog VARCHAR(256) NOT NULL, test_schema VARCHAR(256) NOT NULL, test_prequeries TEXT, test_query TEXT NOT NULL, test_postqueries TEXT, test_username VARCHAR(256) NOT NULL default ”verifier-test”, test_password VARCHAR(256), control_catalog VARCHAR(256) NOT NULL, control_schema VARCHAR(256) NOT NULL, control_prequeries TEXT, control_query TEXT NOT NULL, control_postqueries TEXT, control_username VARCHAR(256) NOT NULL default ”verifier-test”, control_password VARCHAR(256), session_properties_json TEXT, PRIMARY KEY (id) ); Add Config Settings Create a properties file to configure the verifier − $ vi config.properties suite = mysuite query-database = jdbc:mysql://localhost:3306/tutorials?user=root&password=pwd control.gateway = jdbc:presto://localhost:8080 test.gateway = jdbc:presto://localhost:8080 thread-count = 1 Here, in the query-database field, enter the following details − mysql database name, user name, and password. Download JAR File Download Presto-verifier jar file by visiting the following link, https://repo1.maven.org/maven2/com/facebook/presto/presto-verifier/0.149/ Now the version “presto-verifier-0.149-executable.jar” is downloaded on your machine. Execute JAR Execute the JAR file using the following command, $ mv presto-verifier-0.149-executable.jar verifier $ chmod+x verifier Run Verifier Run the verifier using the following command, $ ./verifier config.properties Create Table Let’s create a simple table in “test” database using the following query. create table product(id int not null, name varchar(50)) Insert Table After creating a table, insert two records using the following query, insert into product values(1,’Phone”) insert into product values(2,’Television’) Run Verifier Query Execute the following sample query in the verifier terminal (./verifier config.propeties) to check the verifier result. Sample Query insert into verifier_queries (suite, test_catalog, test_schema, test_query, control_catalog, control_schema, control_query) values (”mysuite”, ”mysql”, ”default”, ”select * from mysql.test.product”, ”mysql”, ”default”, ”select * from mysql.test.product”); Here, select * from mysql.test.product query refers to mysql catalog, test is database name and product is table name. In this way, you can access mysql connector using Presto server. Here, two same select queries are tested against each other to see the performance. Similarly, you can run other queries to test the performance results. You can also connect two Presto clusters to check the performance results. Print Page Previous Next Advertisements ”;
Apache Presto – MySQL Connector ”; Previous Next The MySQL connector is used to query an external MySQL database. Prerequisites MySQL server installation. Configuration Settings Hopefully you have installed mysql server on your machine. To enable mysql properties on Presto server, you must create a file “mysql.properties” in “etc/catalog” directory. Issue the following command to create a mysql.properties file. $ cd etc $ cd catalog $ vi mysql.properties connector.name = mysql connection-url = jdbc:mysql://localhost:3306 connection-user = root connection-password = pwd Save the file and quit the terminal. In the above file, you must enter your mysql password in connection-password field. Create Database in MySQL Server Open MySQL server and create a database using the following command. create database tutorials Now you have created “tutorials” database in the server. To enable database type, use the command “use tutorials” in the query window. Create Table Let’s create a simple table on “tutorials” database. create table author(auth_id int not null, auth_name varchar(50),topic varchar(100)) Insert Table After creating a table, insert three records using the following query. insert into author values(1,”Doug Cutting”,”Hadoop”) insert into author values(2,’James Gosling”,”java”) insert into author values(3,”Dennis Ritchie’,”C”) Select Records To retrieve all the records, type the following query. Query select * from author Result auth_id auth_name topic 1 Doug Cutting Hadoop 2 James Gosling java 3 Dennis Ritchie C As of now, you have queried data using MySQL server. Let’s connect Mysql storage plugin to Presto server. Connect Presto CLI Type the following command to connect MySql plugin on Presto CLI. ./presto –server localhost:8080 –catalog mysql –schema tutorials You will receive the following response. presto:tutorials> Here “tutorials” refers to schema in mysql server. List Schemas To list out all the schemas in mysql, type the following query in Presto server. Query presto:tutorials> show schemas from mysql; Result Schema ——————– information_schema performance_schema sys tutorials From this result, we can conclude the first three schemas as predefined and the last one as created by yourself. List Tables from Schema Following query lists out all the tables in tutorials schema. Query presto:tutorials> show tables from mysql.tutorials; Result Table ——– author We have created only one table in this schema. If you have created multiple tables, it will list out all the tables. Describe Table To describe the table fields, type the following query. Query presto:tutorials> describe mysql.tutorials.author; Result Column | Type | Comment ———–+————–+——— auth_id | integer | auth_name | varchar(50) | topic | varchar(100) | Show Columns from Table Query presto:tutorials> show columns from mysql.tutorials.author; Result Column | Type | Comment ———–+————–+——— auth_id | integer | auth_name | varchar(50) | topic | varchar(100) | Access Table Records To fetch all the records from mysql table, issue the following query. Query presto:tutorials> select * from mysql.tutorials.author; Result auth_id | auth_name | topic ———+—————-+——– 1 | Doug Cutting | Hadoop 2 | James Gosling | java 3 | Dennis Ritchie | C From this result, you can retrieve mysql server records in Presto. Create Table Using as Command Mysql connector doesn’t support create table query but you can create a table using as command. Query presto:tutorials> create table mysql.tutorials.sample as select * from mysql.tutorials.author; Result CREATE TABLE: 3 rows You can’t insert rows directly because this connector has some limitations. It cannot support the following queries − create insert update delete drop To view the records in the newly created table, type the following query. Query presto:tutorials> select * from mysql.tutorials.sample; Result auth_id | auth_name | topic ———+—————-+——– 1 | Doug Cutting | Hadoop 2 | James Gosling | java 3 | Dennis Ritchie | C Print Page Previous Next Advertisements ”;
Apache Presto – JMX Connector ”; Previous Next Java Management Extensions (JMX) gives information about the Java Virtual Machine and software running inside JVM. The JMX connector is used to query JMX information in Presto server. As we have already enabled “jmx.properties” file under “etc/catalog” directory. Now connect Prest CLI to enable JMX plugin. Presto CLI Query $ ./presto –server localhost:8080 –catalog jmx –schema jmx Result You will receive the following response. presto:jmx> JMX Schema To list out all the schemas in “jmx”, type the following query. Query presto:jmx> show schemas from jmx; Result Schema ——————– information_schema current Show Tables To view the tables in the “current” schema, use the following command. Query 1 presto:jmx> show tables from jmx.current; Result Table —————————————————————————— com.facebook.presto.execution.scheduler:name = nodescheduler com.facebook.presto.execution:name = queryexecution com.facebook.presto.execution:name = querymanager com.facebook.presto.execution:name = remotetaskfactory com.facebook.presto.execution:name = taskexecutor com.facebook.presto.execution:name = taskmanager com.facebook.presto.execution:type = queryqueue,name = global,expansion = global ……………… ………………. Query 2 presto:jmx> select * from jmx.current.”java.lang:type = compilation”; Result node | compilationtimemonitoringsupported | name | objectname | totalcompilationti ————————————–+————————————+——————————–+—————————-+——————- ffffffff-ffff-ffff-ffff-ffffffffffff | true | HotSpot 64-Bit Tiered Compilers | java.lang:type=Compilation | 1276 Query 3 presto:jmx> select * from jmx.current.”com.facebook.presto.server:name = taskresource”; Result node | readfromoutputbuffertime.alltime.count | readfromoutputbuffertime.alltime.max | readfromoutputbuffertime.alltime.maxer ————————————–+—————————————+————————————–+————————————— ffffffff-ffff-ffff-ffff-ffffffffffff | 92.0 | 1.009106149 | Print Page Previous Next Advertisements ”;
Apache Presto – Architecture
Apache Presto – Architecture ”; Previous Next The architecture of Presto is almost similar to classic MPP (massively parallel processing) DBMS architecture. The following diagram illustrates the architecture of Presto. The above diagram consists of different components. Following table describes each of the component in detail. S.No Component & Description 1. Client Client (Presto CLI) submits SQL statements to a coordinator to get the result. 2. Coordinator Coordinator is a master daemon. The coordinator initially parses the SQL queries then analyzes and plans for the query execution. Scheduler performs pipeline execution, assigns work to the closest node and monitors progress. 3. Connector Storage plugins are called as connectors. Hive, HBase, MySQL, Cassandra and many more act as a connector; otherwise you can also implement a custom one. The connector provides metadata and data for queries. The coordinator uses the connector to get metadata for building a query plan. 4. Worker The coordinator assigns task to worker nodes. The workers get actual data from the connector. Finally, the worker node delivers result to the client. Presto − Workflow Presto is a distributed system that runs on a cluster of nodes. Presto’s distributed query engine is optimized for interactive analysis and supports standard ANSI SQL, including complex queries, aggregations, joins, and window functions. Presto architecture is simple and extensible. Presto client (CLI) submits SQL statements to a master daemon coordinator. The scheduler connects through execution pipeline. The scheduler assigns work to nodes which is closest to the data and monitors progress. The coordinator assigns task to multiple worker nodes and finally the worker node delivers the result back to the client. The client pulls data from the output process. Extensibility is the key design. Pluggable connectors like Hive, HBase, MySQL, etc., provides metadata and data for queries. Presto was designed with a “simple storage abstraction” that makes it easy to provide SQL query capability against these different kind of data sources. Execution Model Presto supports custom query and execution engine with operators designed to support SQL semantics. In addition to improved scheduling, all processing is in memory and pipelined across the network between different stages. This avoids unnecessary I/O latency overhead. Print Page Previous Next Advertisements ”;
Apache Presto – JDBC Interface ”; Previous Next Presto’s JDBC interface is used to access Java application. Prerequisites Install presto-jdbc-0.150.jar You can download the JDBC jar file by visiting the following link, https://repo1.maven.org/maven2/com/facebook/presto/presto-jdbc/0.150/ After the jar file has been downloaded, add it to the class path of your Java application. Create a Simple Application Let’s create a simple java application using JDBC interface. Coding − PrestoJdbcSample.java import java.sql.*; import com.facebook.presto.jdbc.PrestoDriver; //import presto jdbc driver packages here. public class PrestoJdbcSample { public static void main(String[] args) { Connection connection = null; Statement statement = null; try { Class.forName(“com.facebook.presto.jdbc.PrestoDriver”); connection = DriverManager.getConnection( “jdbc:presto://localhost:8080/mysql/tutorials”, “tutorials”, “”); //connect mysql server tutorials database here statement = connection.createStatement(); String sql; sql = “select auth_id, auth_name from mysql.tutorials.author”; //select mysql table author table two columns ResultSet resultSet = statement.executeQuery(sql); while(resultSet.next()){ int id = resultSet.getInt(“auth_id”); String name = resultSet.getString(“auth_name”); System.out.print(“ID: ” + id + “;nName: ” + name + “n”); } resultSet.close(); statement.close(); connection.close(); }catch(SQLException sqlException){ sqlException.printStackTrace(); }catch(Exception exception){ exception.printStackTrace(); } } } Save the file and quit the application. Now, start Presto server in one terminal and open a new terminal to compile and execute the result. Following are the steps − Compilation ~/Workspace/presto/presto-jdbc $ javac -cp presto-jdbc-0.149.jar PrestoJdbcSample.java Execution ~/Workspace/presto/presto-jdbc $ java -cp .:presto-jdbc-0.149.jar PrestoJdbcSample Output INFO: Logging initialized @146ms ID: 1; Name: Doug Cutting ID: 2; Name: James Gosling ID: 3; Name: Dennis Ritchie Print Page Previous Next Advertisements ”;