Tableau – Condition Filters ”; Previous Next One of the important filtering options in Tableau is to apply some conditions to already existing filters. These conditions can be very simple like finding only those sales which are higher than a certain amount or it can be a complex one based on a certain formula. The conditions can also be applied to create a range filter. Creating a Condition Filter Using the Sample-superstore, let”s find that sub-category of products across all segments whose sales exceed one million. To achieve this objective, following are the steps. Step 1 − Drag the dimension segment and the measure Sales to the Column shelf. Next, drag the dimension Sub-Category to the Rows shelf. Choose the horizontal bar chart option. You will get the following chart. Step 2 − Drag the dimension Sub-Category to the Filters Shelf. Right-click to edit and go to the tab Condition. Here, choose the radio option by field. From the drop-down, select Sales, Sum and greater than equal to symbol specifying the value 100000. On completion of the above two steps, we get a chart which shows only those subcategory of products, which have the required amount of sale. Also this is shown for all the available segments where the condition is met. Print Page Previous Next Advertisements ”;
Category: Big Data & Analytics
Probability
Statistics – Probability ”; Previous Next Probability Probability implies ”likelihood” or ”chance”. When an event is certain to happen then the probability of occurrence of that event is 1 and when it is certain that the event cannot happen then the probability of that event is 0. Hence the value of probability ranges from 0 to 1. Probability has been defined in a varied manner by various schools of thought. Some of which are discussed below. Classical Definition of Probability As the name suggests the classical approach to defining probability is the oldest approach. It states that if there are n exhaustive, mutually exclusive andequally likely cases out of which m cases are favourable to the happening ofevent A, Then the probabilities of event A is defined as given by the following probability function: Formula ${P(A) = frac{Number of favourable cases}{Total number of equally likely cases} = frac{m}{n}}$ Thus to calculate the probability we need information on number of favorable cases and total number of equally likely cases. This can he explained using following example. Example Problem Statement: A coin is tossed. What is the probability of getting a head? Solution: Total number of equally likely outcomes (n) = 2 (i.e. head or tail) Number of outcomes favorable to head (m) = 1 ${P(head) = frac{1}{2}}$ Print Page Previous Next Advertisements ”;
Tableau – Bar Chart
Tableau – Bar Chart ”; Previous Next A bar chart represents data in rectangular bars with the length of the bar proportional to the value of the variable. Tableau automatically produces a bar chart when you drag a dimension to the Row shelf and measure to the Column shelf. We can also use the bar chart option present in the Show Me button. If the data is not appropriate for bar chart, then this option will be automatically greyed out. In Tableau, various types of bar charts can be created by using a dimension and a measure. Simple Bar Chart From the Sample-Superstore, choose the dimension, take profit to the columns shelf and Sub-Category to the rows shelf. It automatically produces a horizontal bar chart as shown in the following screenshot. In case, it does not, you can choose the chart type from the Show Me tool to get the following result. Bar Chart with Color Range You can apply colors to the bars based on their ranges. The longer bars get darker shades and the smaller bars get the lighter shades. To do this, drag the profit field to the color palette under the Marks Pane. Also note that, it produces a different color for negative bars. Stacked Bar Chart You can add another dimension to the above bar chart to produce a stacked bar chart, which shows different colors in each bar. Drag the dimension field named segment to the Marks pane and drop it in colors. The following chart appears which shows the distribution of each segment in each bar. Print Page Previous Next Advertisements ”;
Sample planning
Statistics – Sample Planning ”; Previous Next Sample planning refers to a detailed outline of measurements to be taken: At what time – Decide the time when a survey is to be conducted. For example, taking people views on newspaper outreach before launch of a new newspaper in the area. On Which material – Decide the material on which the survey is to be conducted. It could be a online poll or paper based checklist. In what manner – Decide the sampling methods which will be used to choose people on whom the survey is to be conducted. By whom – Decide the person(s) who has to collect the observations. Sampling plans should be prepared in such a way that the result correctly represent the representative sample of interest and allows all questions to be answered. Steps Following are the steps involved in sample planning. Identification of parameters – Identify the attributes/ parameters to be measured. Identify the ranges, possible values and required resolution. Choose Sampling Method – Choose a sampling method with details like how and when samples are to be identified. Select Sample Size – Select an appropriate sample size to represent the population correctly. Large samples are generally proner to invalid conclusion. Select storage formats – Choose a data storage format in which the sampled data is to be kept. Assign Roles – Assign roles and responsibilities to each person involved in collecting, processing, statistically testing steps. Verify and execute – Sampling plan should be verifiable. Once verified, pass it to related parties to execute it. Print Page Previous Next Advertisements ”;
Tableau – Pie Chart
Tableau – Pie Chart ”; Previous Next A pie chart represents data as slices of a circle with different sizes and colors. The slices are labeled and the numbers corresponding to each slice is also represented in the chart. You can select the pie chart option from the Marks card to create a pie chart. Simple Pie Chart Choose one dimension and one measure to create a simple pie chart. For example, take the dimension named region with the measure named profit. Drop the Region dimension in the colors and label marks. Drop the Profit measure into the size mark. Choose the chart type as Pie. The following chart appears which shows the 4 regions in different colors. Drill-Down Pie Chart You can choose a dimension with hierarchy and as you go deeper into the hierarchy, the chart changes reflect the level of the dimension chosen. In the following example, we take the dimension Sub-Category which has two more levels – Manufacturer and Product Name. Take the measure profit and drop it to the Labels mark. The following pie chart appears which shows the values for each slice. Going one more level into the hierarchy, we get the manufacturer as the label and the above pie chart changes to the following one. Print Page Previous Next Advertisements ”;
Statistical Significance
Statistics – Statistical Significance ”; Previous Next Statistical Significance signifies that result of a statistical experiment or testing is not occuring randomly and is attributable to certain cause. Statistical significance of a result could be strong or weak and it is very important for sectors which are heavily dependent on research works like insurance, pharma, finance, physics and so. Statistical Significance helps in choosing the sample data so that one can judge the result or outcome of testing to be realistic and not be caused by a random cause. Statisticians generally formulates the degree of statistical significance by sampling error. Generally sampling error of 5% is acceptable. Sample size is also important as it should be representative sample instead of very large sample considering the fact that large samples are prone to errors. Significance Level A level at which an event is considered to be statistical significant is termed as significance level. Statisticians uses a test statistic called p-value to get the statistical significance. If p-value of an event falls below a particular level then the event is considered as statistical significant. p-value is function of standard deviations and means of data samples. p-value is the probability of an event which certifies that result of statistical testing is occuring by chance or due to some sampling error. In other words it is the risk of failure of a statistical test. Opposite of p-value is confidence level which is 1 – p-value. If p-value of a result is 5% then that means confidence level of the result is 95%. Print Page Previous Next Advertisements ”;
Residual analysis
Statistics – Residual analysis ”; Previous Next Residual analysis is used to assess the appropriateness of a linear regression model by defining residuals and examining the residual plot graphs. Residual Residual($ e $) refers to the difference between observed value($ y $) vs predicted value ($ hat y $). Every data point have one residual. ${ residual = observedValue – predictedValue \[7pt] e = y – hat y }$ Residual Plot A residual plot is a graph in which residuals are on tthe vertical axis and the independent variable is on the horizontal axis. If the dots are randomly dispersed around the horizontal axis then a linear regression model is appropriate for the data; otherwise, choose a non-linear model. Types of Residual Plot Following example shows few patterns in residual plots. In first case, dots are randomly dispersed. So linear regression model is preferred. In Second and third case, dots are non-randomly dispersed and suggests that a non-linear regression method is preferred. Example Problem Statement: Check where a linear regression model is appropriate for the following data. $ x $ 60 70 80 85 95 $ y $ (Actual Value) 70 65 70 95 85 $ hat y $ (Predicted Value) 65.411 71.849 78.288 81.507 87.945 Solution: Step 1: Compute residuals for each data point. $ x $ 60 70 80 85 95 $ y $ (Actual Value) 70 65 70 95 85 $ hat y $ (Predicted Value) 65.411 71.849 78.288 81.507 87.945 $ e $ (Residual) 4.589 -6.849 -8.288 13.493 -2.945 Step 2: – Draw the residual plot graph. Step 3: – Check the randomness of the residuals. Here residual plot exibits a random pattern – First residual is positive, following two are negative, the fourth one is positive, and the last residual is negative. As pattern is quite random which indicates that a linear regression model is appropriate for the above data. Print Page Previous Next Advertisements ”;
Standard normal table
Statistics – Standard normal table ”; Previous Next Standard Normal Table Z is the standard normal random variable. The table value for Z is the value of the cumulative normal distribution at z. This is the left-tailed normal table. As z-value increases, the normal table value also increases. For example, the value for Z=1.96 is P (Z < 1.96) = .9750. z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09 0.0 .5000 .4960 .4920 .4880 .4840 .4801 .4761 .4721 .4681 .4641 0.1 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753 0.2 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141 0.3 .6179 .6217 .6255 .6293 .6331 .6368 .6406 .6443 .6480 .6517 0.4 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879 0.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224 0.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549 0.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852 0.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133 0.9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389 1.0 .8413 .8438 .8461 .8485 .8508 .8531 .8554 .8577 .8599 .8621 1.1 .8643 .8665 .8686 .8708 .8729 .8749 .8770 .8790 .8810 .8830 1.2 .8849 .8869 .8888 .8907 .8925 .8944 .8962 .8980 .8997 .9015 1.3 .9032 .9049 .9066 .9082 .9099 .9115 .9131 .9147 .9162 .9177 1.4 .9192 .9207 .9222 .9236 .9251 .9265 .9279 .9292 .9306 .9319 1.5 .9332 .9345 .9357 .9370 .9382 .9394 .9406 .9418 .9429 .9441 1.6 .9452 .9463 .9474 .9484 .9495 .9505 .9515 .9525 .9535 .9545 1.7 .9554 .9564 .9573 .9582 .9591 .9599 .9608 .9616 .9625 .9633 1.8 .9641 .9649 .9656 .9664 .9671 .9678 .9686 .9693 .9699 .9706 1.9 .9713 .9719 .9726 .9732 .9738 .9744 .9750 .9756 .9761 .9767 2.0 .9772 .9778 .9783 .9788 .9793 .9798 .9803 .9808 .9812 .9817 2.1 .9821 .9826 .9830 .9834 .9838 .9842 .9846 .9850 .9854 .9857 2.2 .9861 .9864 .9868 .9871 .9875 .9878 .9881 .9884 .9887 .9890 2.3 .9893 .9896 .9898 .9901 .9904 .9906 .9909 .9911 .9913 .9916 2.4 .9918 .9920 .9922 .9925 .9927 .9929 .9931 .9932 .9934 .9936 2.5 .9938 .9940 .9941 .9943 .9945 .9946 .9948 .9949 .9951 .9952 2.6 .9953 .9955 .9956 .9957 .9959 .9960 .9961 .9962 .9963 .9964 2.7 .9965 .9966 .9967 .9968 .9969 .9970 .9971 .9972 .9973 .9974 2.8 .9974 .9975 .9976 .9977 .9977 .9978 .9979 .9979 .9980 .9981 2.9 .9981 .9982 .9982 .9983 .9984 .9984 .9985 .9985 .9986 .9986 3.0 .9987 .9987 .9987 .9988 .9988 .9989 .9989 .9989 .9990 .9990 3.1 .9990 .9991 .9991 .9991 .9992 .9992 .9992 .9992 .9993 .9993 3.2 .9993 .9993 .9994 .9994 .9994 .9994 .9994 .9995 .9995 .9995 3.3 .9995 .9995 .9995 .9996 .9996 .9996 .9996 .9996 .9996 .9997 3.4 .9997 .9997 .9997 .9997 .9997 .9997 .9997 .9997 .9997 .9998 Print Page Previous Next Advertisements ”;
Mean Deviation
Statistics – Mean Deviation ”; Previous Next Referred to as average deviation, it is defined as the sum of the deviations(ignoring signs) from an average divided by the number of items in a distribution The average can be mean, median or mode. Theoretically median is d best average of choice because sum of deviations from median is minimum, provided signs are ignored. However, practically speaking, arithmetic mean is the most commonly used average for calculating mean deviation and is denoted by the symbol ${MD}$. We”re going to discuss methods to compute the Mean Deviation for three types of series: Individual Data Series Discrete Data Series Continuous Data Series Individual Data Series When data is given on individual basis. Following is an example of individual series: Items 5 10 20 30 40 50 60 70 Discrete Data Series When data is given alongwith their frequencies. Following is an example of discrete series: Items 5 10 20 30 40 50 60 70 Frequency 2 5 1 3 12 0 5 7 Continuous Data Series When data is given based on ranges alongwith their frequencies. Following is an example of continous series: Items 0-5 5-10 10-20 20-30 30-40 Frequency 2 5 1 3 12 Print Page Previous Next Advertisements ”;
Statistics – Quadratic Regression Equation ”; Previous Next Quadratic regression is deployed to figure out an equation of the parabola which can best fit the given set of data. It is of following form: ${ y = ax^2 + bx + c where a ne 0}$ Least square method can be used to find out the Quadratic Regression Equation. In this method, we find out the value of a, b and c so that squared vertical distance between each given point (${x_i, y_i}$) and the parabola equation (${ y = ax^2 + bx + c}$) is minimal. The matrix equation for the parabolic curve is given by: $ {begin{bmatrix} sum {x_i}^4 & sum {x_i}^3 & sum {x_i}^2 \ sum {x_i}^3 & sum {x_i}^2 & sum x_i \ sum {x_i}^2 & sum x_i & n end{bmatrix} begin{bmatrix} a \ b \ c end{bmatrix} = begin{bmatrix} sum {x_i}^2{y_i} \ sum x_iy_i \ sum y_i end{bmatrix} }$ Correlation Coefficient, r Correlation coefficient, r determines how good a quardratic equation can fit the given data. If r is close to 1 then it is good fit. r can be computed by following formula. ${ r = 1 – frac{SSE}{SST} where \[7pt] SSE = sum (y_i – a{x_i}^2 – bx_i – c)^2 \[7pt] SST = sum (y_i – bar y)^2 }$ Generally, quadratic regression calculators are used to compute the quadratic regression equation. Example Problem Statement: Compute the quadratic regression equation of following data. Check its best fitness. x -3 -2 -1 0 1 2 3 y 7.5 3 0.5 1 3 6 14 Solution: Compute a quadratic regression on calculator by putting the x and y values. The best fit quadratic equation for above points comes as ${ y = 1.1071x^2 + x + 0.5714 }$ To check the best fitness, plot the graph. So the value of Correlation Coefficient, r for the data is 0.99420 and is close to 1. Hence quadratic regression equation is best fit. Print Page Previous Next Advertisements ”;