Tableau – Get Started ”; Previous Next In this chapter, you will learn some basic operations in Tableau to get acquainted with its interface. There are three basic steps involved in creating any Tableau data analysis report. These three steps are − Connect to a data source − It involves locating the data and using an appropriate type of connection to read the data. Choose dimensions and measures − This involves selecting the required columns from the source data for analysis. Apply visualization technique − This involves applying required visualization methods, such as a specific chart or graph type to the data being analyzed. For convenience, let’s use the sample data set that comes with Tableau installation named sample – superstore.xls. Locate the installation folder of Tableau and go to My Tableau Repository. Under it, you will find the above file at Datasources9.2en_US-US. Connect to a Data Source On opening Tableau, you will get the start page showing various data sources. Under the header “Connect”, you have options to choose a file or server or saved data source. Under Files, choose excel. Then navigate to the file “Sample – Superstore.xls” as mentioned above. The excel file has three sheets named Orders, People and Returns. Choose Orders. Choose the Dimensions and Measures Next, choose the data to be analyzed by deciding on the dimensions and measures. Dimensions are the descriptive data while measures are numeric data. When put together, they help visualize the performance of the dimensional data with respect to the data which are measures. Choose Category and Region as the dimensions and Sales as the measure. Drag and drop them as shown in the following screenshot. The result shows the total sales in each category for each region. Apply Visualization Technique In the previous step, you can see that the data is available only as numbers. You have to read and calculate each of the values to judge the performance. However, you can see them as graphs or charts with different colors to make a quicker judgment. We drag and drop the sum (sales) column from the Marks tab to the Columns shelf. The table showing the numeric values of sales now turns into a bar chart automatically. You can apply a technique of adding another dimension to the existing data. This will add more colors to the existing bar chart as shown in the following screenshot. Print Page Previous Next Advertisements ”;
Category: Big Data & Analytics
Simple random sampling
Statistics – Simple random sampling ”; Previous Next A simple random sample is defined as one in which each element of the population has an equal and independent chance of being selected. In case of a population with N units, the probability of choosing n sample units, with all possible combinations of NCn samples is given by 1/NCn e.g. If we have a population of five elements (A, B, C, D, E) i.e. N 5, and we want a sample of size n = 3, then there are 5C3 = 10 possible samples and the probability of any single unit being a member of the sample is given by 1/10. Simple random sampling can be done in two different ways i.e. ”with replacement” or ”without replacement”. When the units are selected into a sample successively after replacing the selected unit before the next draw, it is a simple random sample with replacement. If the units selected are not replaced before the next draw and drawing of successive units are made only from the remaining units of the population, then it is termed as simple random sample without replacement. Thus in the former method a unit once selected may be repeated, whereas in the latter a unit once selected is not repeated. Due to more statistical efficiency associated with a simple random sample without replacement it is the preferred method. A simple random sample can be drawn through either of the two procedures i.e. through lottery method or through random number tables. Lottery Method – Under this method units are selected on the basis of random draws. Firstly each member or element of the population is assigned a unique number. In the next step these numbers are written on separate cards which are physically similar in shape, size, color etc. Then they are placed in a basket and thoroughly mixed. In the last step the slips are taken out randomly without looking at them. The number of slips drawn is equal to the sample size required. Lottery method suffers from few drawbacks. The process of writing N number of slips is cumbersome and shuffling a large number of slips, where population size is very large, is difficult. Also human bias may enter while choosing the slips. Hence the other alternative i.e. random numbers can be used. Random Number Tables Method – These consist of columns of numbers which have been randomly prepared. Number of random tables are available e.g. Fisher and Yates Tables, Tippets random number etc. Listed below is a sequence of two digited random numbers from Fisher & Yates table: 61, 44, 65, 22, 01, 67, 76, 23, 57, 58, 54, 11, 33, 86, 07, 26, 75, 76, 64, 22, 19, 35, 74, 49, 86, 58, 69, 52, 27, 34, 91, 25, 34, 67, 76, 73, 27, 16, 53, 18, 19, 69, 32, 52, 38, 72, 38, 64, 81, 79 and 38. The first step involves assigning a unique number to each member of the population e.g. if the population comprises of 20 people then all individuals are numbered from 01 to 20. If we are to collect a sample of 5 units then referring to the random number tables 5 double digit numbers are chosen. E.g. using the above table the units having the following five numbers will form a sample: 01, 11, 07, 19 and 16. If the sampling is without replacement and a particular random number repeats itself then it will not be taken again and the next number that fits our criteria will be chosen. Thus a simple random sample can be drawn using either of the two procedures. However in practice, it has been seen that simple random sample involves lots of time and effort and is impractical. Print Page Previous Next Advertisements ”;
Tableau – Data Terminology
Tableau – Data Terminology ”; Previous Next As a powerful data visualization tool, Tableau has many unique terms and definitions. You need to get acquainted with their meaning before you start using the features in Tableau. The following list of terms is comprehensive and explains the terms most frequently used. S.No Terms & Meaning 1 Alias An alternative name that you can assign to a field or to a dimension member. 2 Bin A user-defined grouping of measures in the data source. 3 Bookmark A .tbm file in the Bookmarks folder in the Tableau repository that contains a single worksheet. Much like web browser bookmarks, .tbm files are a convenient way to quickly display different analyses. 4 Calculated Field A new field that you create by using a formula to modify the existing fields in your data source. 5 Crosstab A text table view. Use text tables to display the numbers associated with dimension members. 6 Dashboard A combination of several views arranged on a single page. Use dashboards to compare and monitor a variety of data simultaneously. 7 Data Pane A pane on the left side of the workbook that displays the fields of the data sources to which Tableau is connected. The fields are divided into dimensions and measures. The data pane also displays custom fields such as calculations, binned fields, and groups. You build views of your data by dragging fields from the data pane onto the various shelves that are a part of every worksheet. 8 Data Source Page A page where you can set up your data source. The data source page generally consists of four main areas − left pane, join area, preview area, and metadata area. 9 Dimension A field of categorical data. Dimensions typically hold discrete data such as hierarchies and members that cannot be aggregated. Examples of dimensions include dates, customer names, and customer segments. 10 Extract A saved subset of a data source that you can use to improve performance and analyze offline. You can create an extract by defining filters and limits that include the data you want in the extract. 11 Filters Shelf A shelf on the left of the workbook that you can use to exclude data from a view by filtering it using measures and dimensions. 12 Format Pane A pane that contains formatting settings that control the entire worksheet, as well as individual fields in the view. When open, the Format pane appears on the left side of the workbook. 13 Level Of Detail (LOD) Expression A syntax that supports aggregation at dimensionalities other than the view level. With the level of detail expressions, you can attach one or more dimensions to any aggregate expression. 14 Marks A part of the view that visually represents one or more rows in a data source. A mark can be, for example, a bar, line, or square. You can control the type, color, and size of marks. 15 Marks Card A card to the left of the view, where you can drag fields to control mark properties such as type, color, size, shape, label, tooltip, and detail. 16 Pages Shelf A shelf to the left of the view that you can use to split a view into a sequence of pages based on the members and values in a discrete or continuous field. Adding a field to the Pages shelf is like adding a field to the Rows shelf, except that a new page is created for each new row. 17 Rows Shelf A shelf at the top of the workbook that you can use to create the rows of a data table. The shelf accepts any number of dimensions and measures. When you place a dimension on the Rows shelf, Tableau creates headers for the members of that dimension. When you place a measure on the Rows shelf, Tableau creates quantitative axes for that measure. 18 Shelves Named areas to the left and top of the view. You build views by placing fields onto the shelves. Some shelves are available only when you select certain mark types. For example, the Shape shelf is available only when you select the Shape mark type. 19 Workbook A file with a .twb extension that contains one or more worksheets (and possibly also dashboards and stories). 20 Worksheet A sheet where you build views of your data by dragging fields onto shelves. Print Page Previous Next Advertisements ”;
Standard Error ( SE )
Statistics – Standard Error ( SE ) ”; Previous Next The standard deviation of a sampling distribution is called as standard error. In sampling, the three most important characteristics are: accuracy, bias and precision. It can be said that: The estimate derived from any one sample is accurate to the extent that it differs from the population parameter. Since the population parameters can only be determined by a sample survey, hence they are generally unknown and the actual difference between the sample estimate and population parameter cannot be measured. The estimator is unbiased if the mean of the estimates derived from all the possible samples equals the population parameter. Even if the estimator is unbiased an individual sample is most likely going to yield inaccurate estimate and as stated earlier, inaccuracy cannot be measured. However it is possible to measure the precision i.e. the range between which the true value of the population parameter is expected to lie, using the concept of standard error. Formula $SE_bar{x} = frac{s}{sqrt{n}}$ Where − ${s}$ = Standard Deviation and ${n}$ = No.of observations Example Problem Statement: Calculate Standard Error for the following individual data: Items 14 36 45 70 105 Solution: Let”s first compute the Arithmetic Mean $bar{x}$ $bar{x} = frac{14 + 36 + 45 + 70 + 105}{5} \[7pt] , = frac{270}{5} \[7pt] , = {54}$ Let”s now compute the Standard Deviation ${s}$ $s = sqrt{frac{1}{n-1}((x_{1}-bar{x})^{2}+(x_{2}-bar{x})^{2}+…+(x_{n}-bar{x})^{2})} \[7pt] , = sqrt{frac{1}{5-1}((14-54)^{2}+(36-54)^{2}+(45-54)^{2}+(70-54)^{2}+(105-54)^{2})} \[7pt] , = sqrt{frac{1}{4}(1600+324+81+256+2601)} \[7pt] , = {34.86}$ Thus the Standard Error $SE_bar{x}$ $SE_bar{x} = frac{s}{sqrt{n}} \[7pt] , = frac{34.86}{sqrt{5}} \[7pt] , = frac{34.86}{2.23} \[7pt] , = {15.63}$ The Standard Error of the given numbers is 15.63. The smaller the proportion of the population that is sampled the less is the effect of this multiplier because then the finite multiplier will be close to one and will affect the standard error negligibly. Hence if the sample size is less than 5% of population, the finite multiplier is ignored. Print Page Previous Next Advertisements ”;
Tableau – Forecasting
Tableau – Forecasting ”; Previous Next Forecasting is about predicting the future value of a measure. There are many mathematical models for forecasting. Tableau uses the model known as exponential smoothing. In exponential smoothing, recent observations are given relatively more weight than older observations. These models capture the evolving trend or seasonality of the data and extrapolate them into the future. The result of a forecast can also become a field in the visualization created. Tableau takes a time dimension and a measure field to create a forecast. Creating a Forecast Using the Sample-superstore, forecast the value of the measure sales for next year. To achieve this objective, following are the steps. Step 1 − Create a line chart with Order Date (Year) in the columns shelf and Sales in the Rows shelf. Go to the Analysis tab as shown in the following screenshot and click Forecast under Model category. Step 2 − On completing the above step, you will find the option to set various options for forecast. Choose the Forecast Length as 2 years and leave the Forecast Model to Automatic as shown in the following screenshot. Click OK, and you will get the final forecast result as shown in the following screenshot. Describe Forecast You can also get minute details of the forecast model by choosing the option Describe Forecast. To get this option, right-click on Forecast diagram as shown in the following screenshot. Print Page Previous Next Advertisements ”;
Probability Bayes Theorem
Statistics – Probability Bayes Theorem ”; Previous Next One of the most significant developments in the probability field has been the development of Bayesian decision theory which has proved to be of immense help in making decisions under uncertain conditions. The Bayes Theorem was developed by a British Mathematician Rev. Thomas Bayes. The probability given under Bayes theorem is also known by the name of inverse probability, posterior probability or revised probability. This theorem finds the probability of an event by considering the given sample information; hence the name posterior probability. The bayes theorem is based on the formula of conditional probability. conditional probability of event ${A_1}$ given event ${B}$ is ${P(A_1/B) = frac{P(A_1 and B)}{P(B)}}$ Similarly probability of event ${A_1}$ given event ${B}$ is ${P(A_2/B) = frac{P(A_2 and B)}{P(B)}}$ Where ${P(B) = P(A_1 and B) + P(A_2 and B) \[7pt] P(B) = P(A_1) times P (B/A_1) + P (A_2) times P (BA_2) }$ ${P(A_1/B)}$ can be rewritten as ${P(A_1/B) = frac{P(A_1) times P (B/A_1)}{P(A_1)} times P (B/A_1) + P (A_2) times P (BA_2)}$ Hence the general form of Bayes Theorem is ${P(A_i/B) = frac{P(A_i) times P (B/A_i)}{sum_{i=1}^k P(A_i) times P (B/A_i)}}$ Where ${A_1}$, ${A_2}$…${A_i}$…${A_n}$ are set of n mutually exclusive and exhaustive events. Print Page Previous Next Advertisements ”;
Tableau – Tree Map
Tableau – Tree Map ”; Previous Next The tree map displays data in nested rectangles. The dimensions define the structure of the tree map and measures define the size or color of the individual rectangle. The rectangles are easy to visualize as both the size and shade of the color of the rectangle reflect the value of the measure. A Tree Map is created using one or more dimension with one or two measures. Creating a Tree Map Using the Sample-superstore, plan to find the size of profits for each Ship mode values. To achieve this objective, following are the steps. Step 1 − Drag and drop the measure profit two times to the Marks Card. Once to the Size shelf and again to the Color shelf. Step 2 − Drag and drop the dimension ship mode to the Label shelf. Choose the chart type Tree Map from Show Me. The following chart appears. Tree Map with Two Dimensions You can add the dimension Region to the above Tree map chart. Drag and drop it twice. Once to the Color shelf and again to the Label shelf. The chart that appears will show four outer boxes for four regions and then the boxes for ship modes nested inside them. All the different regions will now have different colors. Print Page Previous Next Advertisements ”;
Residual sum of squares
Statistics – Residual Sum of Squares ”; Previous Next In statistics, the residual sum of squares (RSS), also known as the sum of squared residuals (SSR) or the sum of squared errors of prediction (SSE), is the sum of the squares of residuals (deviations of predicted from actual empirical values of data). Residual Sum of Squares (RSS) is defined and given by the following function: Formula ${RSS = sum_{i=0}^n(epsilon_i)^2 = sum_{i=0}^n(y_i – (alpha + beta x_i))^2}$ Where − ${X, Y}$ = set of values. ${alpha, beta}$ = constant of values. ${n}$ = set value of count Example Problem Statement: Consider two populace bunches, where X = 1,2,3,4 and Y = 4, 5, 6, 7, consistent worth ${alpha}$ = 1, ${beta}$ = 2. Locate the Residual Sum of Square (RSS) values of the two populace bunch. Solution: Given, ${X = 1,2,3,4 Y = 4,5,6,7 alpha = 1 beta = 2 }$ Arrangement: Substitute the given qualities in the recipe, Remaining Sum of Squares Formula ${RSS = sum_{i=0}^n(epsilon_i)^2 = sum_{i=0}^n(y_i – (alpha + beta x_i))^2, \[7pt] = sum(4-(1+(2x_1)))^2 + (5-(1+(2x_2)))^2 + (6-(1+(2x_3))^2 + (7-(1+(2x_4))^2, \[7pt] = sum(1)^2 + (0)^2 + (-1)^2 + (-2)^2, \[7pt] = 6 }$ Print Page Previous Next Advertisements ”;
Tableau – Home
Tableau Tutorial PDF Version Quick Guide Resources Job Search Discussion Tableau is a Business Intelligence tool for visually analyzing the data. Users can create and distribute an interactive and shareable dashboard, which depict the trends, variations, and density of the data in the form of graphs and charts. Tableau can connect to files, relational and Big Data sources to acquire and process data. The software allows data blending and real-time collaboration, which makes it very unique. It is used by businesses, academic researchers, and many government organizations for visual data analysis. It is also positioned as a leader Business Intelligence and Analytics Platform in Gartner Magic Quadrant. Audience This tutorial is designed for all those readers who want to create, read, write, and modify Business Intelligence Reports using Tableau. In addition, it will also be quite useful for those readers who would like to become a Data Analyst or a Data Scientist. Prerequisites Before proceeding with this tutorial, you should have a basic understanding of Computer Programming terminologies and Data analysis. You should also have some knowledge on various types of graphs and charts. Familiarity with SQL will be an added advantage. Print Page Previous Next Advertisements ”;
Statistics – Regression Intercept Confidence Interval ”; Previous Next Regression Intercept Confidence Interval, is a way to determine closeness of two factors and is used to check the reliability of estimation. Formula ${R = beta_0 pm t(1 – frac{alpha}{2}, n-k-1) times SE_{beta_0} }$ Where − ${beta_0}$ = Regression intercept. ${k}$ = Number of Predictors. ${n}$ = sample size. ${SE_{beta_0}}$ = Standard Error. ${alpha}$ = Percentage of Confidence Interval. ${t}$ = t-value. Example Problem Statement: Compute the Regression Intercept Confidence Interval of following data. Total number of predictors (k) are 1, regression intercept ${beta_0}$ as 5, sample size (n) as 10 and standard error ${SE_{beta_0}}$ as 0.15. Solution: Let us consider the case of 99% Confidence Interval. Step 1: Compute t-value where ${ alpha = 0.99}$. ${ = t(1 – frac{alpha}{2}, n-k-1) \[7pt] = t(1 – frac{0.99}{2}, 10-1-1) \[7pt] = t(0.005,8) \[7pt] = 3.3554 }$ Step 2: ${ge} $Regression intercept: ${ = beta_0 + t(1 – frac{alpha}{2}, n-k-1) times SE_{beta_0} \[7pt] = 5 – (3.3554 times 0.15) \[7pt] = 5 – 0.50331 \[7pt] = 4.49669 }$ Step 3: ${le} $Regression intercept: ${ = beta_0 – t(1 – frac{alpha}{2}, n-k-1) times SE_{beta_0} \[7pt] = 5 + (3.3554 times 0.15) \[7pt] = 5 + 0.50331 \[7pt] = 5.50331 }$ As a result, Regression Intercept Confidence Interval is ${4.49669}$ or ${5.50331}$ for 99% Confidence Interval. Print Page Previous Next Advertisements ”;