Tableau – Editing Metadata ”; Previous Next After connecting to the data source, Tableau captures the metadata details of the source like the columns and their data types. This is used to create the dimensions, measures, and calculated fields used in views. You can browse the metadata and change some of its properties for some specific requirements. Checking the Metadata After connecting to a data source, Tableau presents all possible tables and columns present in the source. Consider the source ‘Sample Coffee shop’ for checking the metadata. Click the Data menu and choose to connect to a data source. Browse for the MS access file named ‘Sample – Coffee shop’. Drag the table named Product to the data canvas. On choosing the file, you get the following screen which shows the column names, their data types. The string data types are shown as Abc and Numeric data types are shown as #. Changing the Data Type You can change the datatype of some of the fields if required. Depending on the nature of source data, sometimes Tableau may fail to recognize the data type from the source. In such scenarios, we can manually edit the data type. The following screenshot shows the option. Renaming and Hiding The column names can be changed by using the renaming option. You can also hide a column so that it does not appear in the data view that you create. These options are available by clicking on the data type icon in the metadata grid as shown in the following screenshot. Column Alias Each column of the data source can be assigned an alias which helps better understand the nature of the column. You can choose the aliases option from the above step and the following screen comes up which is used to create or edit aliases. Print Page Previous Next Advertisements ”;
Category: Big Data & Analytics
Tableau – Quick Guide
Tableau – Quick Guide ”; Previous Next Tableau – Overview As a leading data visualization tool, Tableau has many desirable and unique features. Its powerful data discovery and exploration application allows you to answer important questions in seconds. You can use Tableau”s drag and drop interface to visualize any data, explore different views, and even combine multiple databases easily. It does not require any complex scripting. Anyone who understands the business problems can address it with a visualization of the relevant data. After analysis, sharing with others is as easy as publishing to Tableau Server. Tableau Features Tableau provides solutions for all kinds of industries, departments, and data environments. Following are some unique features which enable Tableau to handle diverse scenarios. Speed of Analysis − As it does not require high level of programming expertise, any user with access to data can start using it to derive value from the data. Self-Reliant − Tableau does not need a complex software setup. The desktop version which is used by most users is easily installed and contains all the features needed to start and complete data analysis. Visual Discovery − The user explores and analyzes the data by using visual tools like colors, trend lines, charts, and graphs. There is very little script to be written as nearly everything is done by drag and drop. Blend Diverse Data Sets − Tableau allows you to blend different relational, semistructured and raw data sources in real time, without expensive up-front integration costs. The users don’t need to know the details of how data is stored. Architecture Agnostic − Tableau works in all kinds of devices where data flows. Hence, the user need not worry about specific hardware or software requirements to use Tableau. Real-Time Collaboration − Tableau can filter, sort, and discuss data on the fly and embed a live dashboard in portals like SharePoint site or Salesforce. You can save your view of data and allow colleagues to subscribe to your interactive dashboards so they see the very latest data just by refreshing their web browser. Centralized Data − Tableau server provides a centralized location to manage all of the organization’s published data sources. You can delete, change permissions, add tags, and manage schedules in one convenient location. It’s easy to schedule extract refreshes and manage them in the data server. Administrators can centrally define a schedule for extracts on the server for both incremental and full refreshes. Tableau – Environment Setup In this chapter, you will learn about the environment setup of Tableau. Download Tableau Desktop The Free Personal Edition of Tableau Desktop can be downloaded from Tableau Desktop. You need to register with your details to be able to download. After downloading, the installation is a very straightforward process in which you need to accept the license agreement and provide the target folder for installation. The following steps and screenshots describe the entire setup process. Start the Installation Wizard Double-click the TableauDesktop-64bit-9-2-2.exe. It will present a screen to allow the installation program to run. Click “Run”. Accept the License Agreement Read the license agreement and if you agree, choose the “I have read and accept the terms of this license agreement” option. Then, click “Install”. Start Trial On completion of the installation, the screen prompts you with the option to Start the trial now or later. You may choose to start it now. Also, if you have purchased Tableau then you may enter the License key. Provide Your Details Provide your name and organization details. Then, click “Next”. Registration Complete The registration completion screen appears. Click “Continue”. Verify the Installation You can verify the installation by going to the Windows start menu. Click the Tableau icon. The following screen appears. You are now ready to learn Tableau. Tableau – Get Started In this chapter, you will learn some basic operations in Tableau to get acquainted with its interface. There are three basic steps involved in creating any Tableau data analysis report. These three steps are − Connect to a data source − It involves locating the data and using an appropriate type of connection to read the data. Choose dimensions and measures − This involves selecting the required columns from the source data for analysis. Apply visualization technique − This involves applying required visualization methods, such as a specific chart or graph type to the data being analyzed. For convenience, let’s use the sample data set that comes with Tableau installation named sample – superstore.xls. Locate the installation folder of Tableau and go to My Tableau Repository. Under it, you will find the above file at Datasources9.2en_US-US. Connect to a Data Source On opening Tableau, you will get the start page showing various data sources. Under the header “Connect”, you have options to choose a file or server or saved data source. Under Files, choose excel. Then navigate to the file “Sample – Superstore.xls” as mentioned above. The excel file has three sheets named Orders, People and Returns. Choose Orders. Choose the Dimensions and Measures Next, choose the data to be analyzed by deciding on the dimensions and measures. Dimensions are the descriptive data while measures are numeric data. When put together, they help visualize the performance of the dimensional data with respect to the data which are measures. Choose Category and Region as the dimensions and Sales as the measure. Drag and drop them as shown in the following screenshot. The result shows the total sales in each category for each region. Apply Visualization Technique In the previous step, you can see that the data is available only as numbers. You have to read and calculate each of the values to judge the performance. However, you can see them as graphs or charts with different colors to make a quicker judgment. We drag and drop the sum (sales) column from the Marks tab to the Columns shelf. The table showing the numeric values of sales now turns into a bar chart automatically. You can apply a
Zookeeper – Useful Resources
Zookeeper – Useful Resources ”; Previous Next The following resources contain additional information on Zookeeper. Please use them to get more in-depth knowledge on this topic. Useful Video Courses Building Application Ecosystem with Docker Compose 15 Lectures 40 mins Prashant Hardikar More Detail Web Apps with ReactJS and Redux – The Complete Course 64 Lectures 9.5 hours TELCOMA Global More Detail Learn Big Data Hadoop: Hands-On for Beginner 256 Lectures 13.5 hours Bigdata Engineer More Detail Learn Advanced Apache Kafka from Scratch Featured 154 Lectures 9 hours Learnkart Technology Pvt Ltd More Detail Apache Kafka for Beginners – Learn Kafka by Hands-On 54 Lectures 4.5 hours Packt Publishing More Detail Apache Storm Course 21 Lectures 1.5 hours Corporate Bridge Consultancy Private Limited More Detail Print Page Previous Next Advertisements ”;
Sum of Square
Statistics – Sum of Square ”; Previous Next In statistical data analysis the total sum of squares (TSS or SST) is a quantity that appears as part of a standard way of presenting results of such analyses. It is defined as being the sum, over all observations, of the squared differences of each observation from the overall mean. Total Sum of Squares is defined and given by the following function: Formula ${Sum of Squares = sum(x_i – bar x)^2 }$ Where − ${x_i}$ = frequency. ${bar x}$ = mean. Example Problem Statement: Calculate the sum of square of 9 children whose heights are 100,100,102,98,77,99,70,105,98 and whose means is 94.3. Solution: Given mean = 94.3. To find Sum of Squares: Calculation of Sum of Squares. Column AValue or Score${x_i}$ Column BDeviation Score${sum(x_i – bar x)}$ Column C${(Deviation Score)^2}$${sum(x_i – bar x)^2}$ 100 100-94.3 = 5.7 (5.7)2 = 32.49 100 100-94.3 = 5.7 (5.7)2 = 32.49 102 102-94.3 = 7.7 (7.7)2 = 59.29 98 98-94.3 = 3.7 (3.7)2 = 13.69 77 77-94.3 = -17.3 (-17.3)2 = 299.29 99 99-94.3 = 4.7 (4.7)2 = 22.09 70 70-94.3 = -24.3 (-24.3)2 = 590.49 105 105-94.3 = 10.7 (10.7)2 = 114.49 98 98-94.3 = 3.7 (3.7)2 = 3.69 ${sum x_i = 849}$ ${sum(x_i – bar x)}$ ${sum(x_i – bar x)^2}$ First Moment Sum of Squares Print Page Previous Next Advertisements ”;
Standard Deviation
Statistics – Standard Deviation ”; Previous Next Standard deviation is the square root of the average of squared deviations of the items from their mean. Symbolically it is represented by ${sigma}$. We”re going to discuss methods to compute the Standard deviation for three types of series: Individual Data Series Discrete Data Series Continuous Data Series Individual Data Series When data is given on individual basis. Following is an example of individual series: Items 5 10 20 30 40 50 60 70 Discrete Data Series When data is given alongwith their frequencies. Following is an example of discrete series: Items 5 10 20 30 40 50 60 70 Frequency 2 5 1 3 12 0 5 7 Continuous Data Series When data is given based on ranges alongwith their frequencies. Following is an example of continous series: Items 0-5 5-10 10-20 20-30 30-40 Frequency 2 5 1 3 12 Print Page Previous Next Advertisements ”;
Weak Law of Large Numbers
Statistics – Weak Law of Large Numbers ”; Previous Next The weak law of large numbers is a result in probability theory also known as Bernoulli”s theorem. Let P be a sequence of independent and identically distributed random variables, each having a mean and standard deviation. Formula $${ 0 = lim_{nto infty} P {lvert X – mu rvert gt frac{1}{n} } \[7pt] = P { lim_{nto infty} { lvert X – mu rvert gt frac{1}{n} } } \[7pt] = P { X ne mu } }$$ Where − ${n}$ = Number of samples ${X}$ = Sample value ${mu}$ = Sample mean Example Problem Statement: A six sided die is rolled large number of times. Figure the sample mean of their values. Solution: Sample Mean Calculation $ {Sample Mean = frac{1+2+3+4+5+6}{6} \[7pt] = frac{21}{6}, \[7pt] , = 3.5 }$ Print Page Previous Next Advertisements ”;
Sqoop – Import-All-Tables
Sqoop – Import All Tables ”; Previous Next This chapter describes how to import all the tables from the RDBMS database server to the HDFS. Each table data is stored in a separate directory and the directory name is same as the table name. Syntax The following syntax is used to import all tables. $ sqoop import-all-tables (generic-args) (import-args) $ sqoop-import-all-tables (generic-args) (import-args) Example Let us take an example of importing all tables from the userdb database. The list of tables that the database userdb contains is as follows. +——————–+ | Tables | +——————–+ | emp | | emp_add | | emp_contact | +——————–+ The following command is used to import all the tables from the userdb database. $ sqoop import-all-tables –connect jdbc:mysql://localhost/userdb –username root Note − If you are using the import-all-tables, it is mandatory that every table in that database must have a primary key field. The following command is used to verify all the table data to the userdb database in HDFS. $ $HADOOP_HOME/bin/hadoop fs -ls It will show you the list of table names in userdb database as directories. Output drwxr-xr-x – hadoop supergroup 0 2014-12-22 22:50 _sqoop drwxr-xr-x – hadoop supergroup 0 2014-12-23 01:46 emp drwxr-xr-x – hadoop supergroup 0 2014-12-23 01:50 emp_add drwxr-xr-x – hadoop supergroup 0 2014-12-23 01:52 emp_contact Print Page Previous Next Advertisements ”;
Statistics – Discussion
Discuss Statistics ”; Previous Next An operating system (OS) is a collection of software that manages computer hardware resources and provides common services for computer programs. The operating system is a vital component of the system software in a computer system. This tutorial will take you through step by step approach while learning Operating System concepts. Print Page Previous Next Advertisements ”;
Sqoop – Discussion
Discuss Sqoop ”; Previous Next Sqoop is a tool designed to transfer data between Hadoop and relational database servers. It is used to import data from relational databases such as MySQL, Oracle to Hadoop HDFS, and export from Hadoop file system to relational databases. This is a brief tutorial that explains how to make use of Sqoop in Hadoop ecosystem. Print Page Previous Next Advertisements ”;
Trimmed Mean
Statistics – Trimmed Mean ”; Previous Next Trimmed Mean a method of averaging that removes a small percentage of the largest and smallest values before calculating the mean. The Trimmed Mean can be calculated using the following formula. Formula $mu = frac{sum {X_i}}{n}$ Where − $sum {X_i}$ = Sum of your Trimmed Set. ${n}$ = Total Numbers in Trimmed set. ${mu}$ = Trimmed Mean. Example Problem Statement: Figure out the 20% trimmed mean for the number set {8, 3, 7, 1, 3, and 9} Items 14 36 45 70 105 Trimmed Mean Percent = $frac{20}{100} = 0.2$; Sample Size=6 Give us a chance to first ascertain the estimation of Trimmed check (g), where g alludes to number of qualities to be trimmed from the given arrangement. g = Floor (Trimmed Mean Percent x Sample Size) g = Floor (0.2 x 6) g = Floor (1.2) Trimmed check (g) = 1 Record the given arrangement of numbers {8, 3, 7, 1, 3, 9} in rising request, = 1, 3, 3,7,8,9 As the trimmed tally is 1, we ought to expel one number from the earliest starting point and end. Along these lines, we uproot first number (1) and last number (9) from the above arrangement of numbers, = 3, 3, 7, 8.Now Trimmed mean can be computed as: $mu = frac{sum {X_i}}{n} \[7pt] , = frac{Sum of your Trimmed Set}{Total Numbers in Trimmed set} \[7pt] , = frac{(3 + 3 + 7 + 8)}{4} , = frac{21}{4} \[7pt] , = {5.25}$ The Trimmed Mean of the given numbers is 5.25. Print Page Previous Next Advertisements ”;