Big Data & Analytics Archives - Page 12 of 75 - Donotsad where can learn any thing work project and make money

Aug 10

Probability Bayes Theorem

Statistics – Probability Bayes Theorem ”; Previous Next One of the most significant developments in the probability field has been the development of Bayesian decision theory which has proved to be of immense help in making decisions under uncertain conditions. The Bayes Theorem was developed by a British Mathematician Rev. Thomas Bayes. The probability given under Bayes theorem is also known by the name of inverse probability, posterior probability or revised probability. This theorem finds the probability of an event by considering the given sample information; hence the name posterior probability. The bayes theorem is based on the formula of conditional probability. conditional probability of event ${A_1}$ given event ${B}$ is ${P(A_1/B) = frac{P(A_1 and B)}{P(B)}}$ Similarly probability of event ${A_1}$ given event ${B}$ is ${P(A_2/B) = frac{P(A_2 and B)}{P(B)}}$ Where ${P(B) = P(A_1 and B) + P(A_2 and B) \[7pt] P(B) = P(A_1) times P (B/A_1) + P (A_2) times P (BA_2) }$ ${P(A_1/B)}$ can be rewritten as ${P(A_1/B) = frac{P(A_1) times P (B/A_1)}{P(A_1)} times P (B/A_1) + P (A_2) times P (BA_2)}$ Hence the general form of Bayes Theorem is ${P(A_i/B) = frac{P(A_i) times P (B/A_i)}{sum_{i=1}^k P(A_i) times P (B/A_i)}}$ Where ${A_1}$, ${A_2}$…${A_i}$…${A_n}$ are set of n mutually exclusive and exhaustive events. Print Page Previous Next Advertisements ”;

Aug 10

Tableau – Tree Map

Tableau – Tree Map ”; Previous Next The tree map displays data in nested rectangles. The dimensions define the structure of the tree map and measures define the size or color of the individual rectangle. The rectangles are easy to visualize as both the size and shade of the color of the rectangle reflect the value of the measure. A Tree Map is created using one or more dimension with one or two measures. Creating a Tree Map Using the Sample-superstore, plan to find the size of profits for each Ship mode values. To achieve this objective, following are the steps. Step 1 − Drag and drop the measure profit two times to the Marks Card. Once to the Size shelf and again to the Color shelf. Step 2 − Drag and drop the dimension ship mode to the Label shelf. Choose the chart type Tree Map from Show Me. The following chart appears. Tree Map with Two Dimensions You can add the dimension Region to the above Tree map chart. Drag and drop it twice. Once to the Color shelf and again to the Label shelf. The chart that appears will show four outer boxes for four regions and then the boxes for ship modes nested inside them. All the different regions will now have different colors. Print Page Previous Next Advertisements ”;

Aug 10

Residual sum of squares

Statistics – Residual Sum of Squares ”; Previous Next In statistics, the residual sum of squares (RSS), also known as the sum of squared residuals (SSR) or the sum of squared errors of prediction (SSE), is the sum of the squares of residuals (deviations of predicted from actual empirical values of data). Residual Sum of Squares (RSS) is defined and given by the following function: Formula ${RSS = sum_{i=0}^n(epsilon_i)^2 = sum_{i=0}^n(y_i – (alpha + beta x_i))^2}$ Where − ${X, Y}$ = set of values. ${alpha, beta}$ = constant of values. ${n}$ = set value of count Example Problem Statement: Consider two populace bunches, where X = 1,2,3,4 and Y = 4, 5, 6, 7, consistent worth ${alpha}$ = 1, ${beta}$ = 2. Locate the Residual Sum of Square (RSS) values of the two populace bunch. Solution: Given, ${X = 1,2,3,4 Y = 4,5,6,7 alpha = 1 beta = 2 }$ Arrangement: Substitute the given qualities in the recipe, Remaining Sum of Squares Formula ${RSS = sum_{i=0}^n(epsilon_i)^2 = sum_{i=0}^n(y_i – (alpha + beta x_i))^2, \[7pt] = sum(4-(1+(2x_1)))^2 + (5-(1+(2x_2)))^2 + (6-(1+(2x_3))^2 + (7-(1+(2x_4))^2, \[7pt] = sum(1)^2 + (0)^2 + (-1)^2 + (-2)^2, \[7pt] = 6 }$ Print Page Previous Next Advertisements ”;

Aug 10

Tableau – Home

Tableau Tutorial PDF Version Quick Guide Resources Job Search Discussion Tableau is a Business Intelligence tool for visually analyzing the data. Users can create and distribute an interactive and shareable dashboard, which depict the trends, variations, and density of the data in the form of graphs and charts. Tableau can connect to files, relational and Big Data sources to acquire and process data. The software allows data blending and real-time collaboration, which makes it very unique. It is used by businesses, academic researchers, and many government organizations for visual data analysis. It is also positioned as a leader Business Intelligence and Analytics Platform in Gartner Magic Quadrant. Audience This tutorial is designed for all those readers who want to create, read, write, and modify Business Intelligence Reports using Tableau. In addition, it will also be quite useful for those readers who would like to become a Data Analyst or a Data Scientist. Prerequisites Before proceeding with this tutorial, you should have a basic understanding of Computer Programming terminologies and Data analysis. You should also have some knowledge on various types of graphs and charts. Familiarity with SQL will be an added advantage. Print Page Previous Next Advertisements ”;

Aug 10

Regression Intercept Confidence Interval

Statistics – Regression Intercept Confidence Interval ”; Previous Next Regression Intercept Confidence Interval, is a way to determine closeness of two factors and is used to check the reliability of estimation. Formula ${R = beta_0 pm t(1 – frac{alpha}{2}, n-k-1) times SE_{beta_0} }$ Where − ${beta_0}$ = Regression intercept. ${k}$ = Number of Predictors. ${n}$ = sample size. ${SE_{beta_0}}$ = Standard Error. ${alpha}$ = Percentage of Confidence Interval. ${t}$ = t-value. Example Problem Statement: Compute the Regression Intercept Confidence Interval of following data. Total number of predictors (k) are 1, regression intercept ${beta_0}$ as 5, sample size (n) as 10 and standard error ${SE_{beta_0}}$ as 0.15. Solution: Let us consider the case of 99% Confidence Interval. Step 1: Compute t-value where ${ alpha = 0.99}$. ${ = t(1 – frac{alpha}{2}, n-k-1) \[7pt] = t(1 – frac{0.99}{2}, 10-1-1) \[7pt] = t(0.005,8) \[7pt] = 3.3554 }$ Step 2: ${ge} $Regression intercept: ${ = beta_0 + t(1 – frac{alpha}{2}, n-k-1) times SE_{beta_0} \[7pt] = 5 – (3.3554 times 0.15) \[7pt] = 5 – 0.50331 \[7pt] = 4.49669 }$ Step 3: ${le} $Regression intercept: ${ = beta_0 – t(1 – frac{alpha}{2}, n-k-1) times SE_{beta_0} \[7pt] = 5 + (3.3554 times 0.15) \[7pt] = 5 + 0.50331 \[7pt] = 5.50331 }$ As a result, Regression Intercept Confidence Interval is ${4.49669}$ or ${5.50331}$ for 99% Confidence Interval. Print Page Previous Next Advertisements ”;

Aug 10

Tableau – Condition Filters

Tableau – Condition Filters ”; Previous Next One of the important filtering options in Tableau is to apply some conditions to already existing filters. These conditions can be very simple like finding only those sales which are higher than a certain amount or it can be a complex one based on a certain formula. The conditions can also be applied to create a range filter. Creating a Condition Filter Using the Sample-superstore, let”s find that sub-category of products across all segments whose sales exceed one million. To achieve this objective, following are the steps. Step 1 − Drag the dimension segment and the measure Sales to the Column shelf. Next, drag the dimension Sub-Category to the Rows shelf. Choose the horizontal bar chart option. You will get the following chart. Step 2 − Drag the dimension Sub-Category to the Filters Shelf. Right-click to edit and go to the tab Condition. Here, choose the radio option by field. From the drop-down, select Sales, Sum and greater than equal to symbol specifying the value 100000. On completion of the above two steps, we get a chart which shows only those subcategory of products, which have the required amount of sale. Also this is shown for all the available segments where the condition is met. Print Page Previous Next Advertisements ”;

Aug 10

Probability

Statistics – Probability ”; Previous Next Probability Probability implies ”likelihood” or ”chance”. When an event is certain to happen then the probability of occurrence of that event is 1 and when it is certain that the event cannot happen then the probability of that event is 0. Hence the value of probability ranges from 0 to 1. Probability has been defined in a varied manner by various schools of thought. Some of which are discussed below. Classical Definition of Probability As the name suggests the classical approach to defining probability is the oldest approach. It states that if there are n exhaustive, mutually exclusive andequally likely cases out of which m cases are favourable to the happening ofevent A, Then the probabilities of event A is defined as given by the following probability function: Formula ${P(A) = frac{Number of favourable cases}{Total number of equally likely cases} = frac{m}{n}}$ Thus to calculate the probability we need information on number of favorable cases and total number of equally likely cases. This can he explained using following example. Example Problem Statement: A coin is tossed. What is the probability of getting a head? Solution: Total number of equally likely outcomes (n) = 2 (i.e. head or tail) Number of outcomes favorable to head (m) = 1 ${P(head) = frac{1}{2}}$ Print Page Previous Next Advertisements ”;

Aug 10

Tableau – Bar Chart

Tableau – Bar Chart ”; Previous Next A bar chart represents data in rectangular bars with the length of the bar proportional to the value of the variable. Tableau automatically produces a bar chart when you drag a dimension to the Row shelf and measure to the Column shelf. We can also use the bar chart option present in the Show Me button. If the data is not appropriate for bar chart, then this option will be automatically greyed out. In Tableau, various types of bar charts can be created by using a dimension and a measure. Simple Bar Chart From the Sample-Superstore, choose the dimension, take profit to the columns shelf and Sub-Category to the rows shelf. It automatically produces a horizontal bar chart as shown in the following screenshot. In case, it does not, you can choose the chart type from the Show Me tool to get the following result. Bar Chart with Color Range You can apply colors to the bars based on their ranges. The longer bars get darker shades and the smaller bars get the lighter shades. To do this, drag the profit field to the color palette under the Marks Pane. Also note that, it produces a different color for negative bars. Stacked Bar Chart You can add another dimension to the above bar chart to produce a stacked bar chart, which shows different colors in each bar. Drag the dimension field named segment to the Marks pane and drop it in colors. The following chart appears which shows the distribution of each segment in each bar. Print Page Previous Next Advertisements ”;

Aug 10

Qualitative Data Vs Quantitative Data

Statistics – Qualitative Data Vs Quantitative Data ”; Previous Next Qualitative Data Qualitative data is a set of information which can not be measured using numbers. It generally consist of words, subjective narratives. Result of an qualitative data analysis can come in form of highlighting key words, extracting information and concepts elaboration. For example, a study on parents perception about the current education system for their kids. The resulted information collected from them might be in narrative form and you need to deduce the analysis that they are satisfied, un-satisfied or need improvement in certain areas and so on. Strengh Better understanding – Qualitative data gives a better understanding of the perspectives and needs of participants. Provides Explaination – Qualitative data along with quantitative data can explain the result of the survey and can measure the correction of the quantitative data. Better Identification of behavior patterns – Qualitative data can provide detailed information which can prove itself useful in identification of behaviorial patterns. Weakness Lesser reachability – Being subjective in nature, small population is generally covered to represent the large population. Time Consuming – Qualitative data is time consuming as large data is to be understood. Possiblity of Bias – Being subjective analysis; evaluator bias is quite feasible. Quantitative Data Quantitative data is a set of numbers collected from a group of people and involves statistical analysis.For example if you conduct a satisfaction survey from participants and ask them to rate their experience on a scale of 1 to 5. You can collect the ratings and being numerical in nature, you will use statistical techniques to draw conclusions about participants satisfaction. Strengh Specific Quantitative data is clear and specific to the survey conducted. High ReliabilityIf collected properly, quantitative data is normally accurate and hence highly reliable. Easy communicationQuantitative data is easy to communicate and elaborate using charts, graphs etc. Existing supportMany large datasets may be already present that can be analyzed to check the relevance of the survey. Weakness Limited Options – Respondents are required to choose from limited options. High Complexity – Qualitative data may need complex procedures to get correct sample. Require Expertise – Analysis of qualitative data requires certain expertise in statistical analysis. Print Page Previous Next Advertisements ”;

Aug 10

Process Capability (Cp) & Process Performance (Pp)

Statistics – Process Capability (Cp) & Process Performance (Pp) ”; Previous Next Process Capability Process capability can be defined as a measurable property of a process relative to its specification. It is expressed as a process capability index ${C_p}$. The process capability index is used to check the variability of the output generated by the process and to compare the variablity with the product tolerance. ${C_p}$ is governed by following formula: Formula ${ C_p = min[frac{USL – mu}{3 times sigma}, frac{mu – LSL}{3 times sigma}] }$ Where − ${USL}$ = Upper Specification Limit. ${LSL}$ = Lower Specification Limit. ${mu}$ = estimated mean of the process. ${sigma}$ = estimated variability of the process, standard deviation. Higher the value of process capability index ${C_p}$, better is the process. Example Consider the case of a car and its parking garage. garage size states the specification limits and car defines the process output. Here process capability will tell the relatonship between car size, garage size and how far from middle of the garage you can parked the car. If car size is litter smaller than garage size then you can easily fit your car into it. If car size is very small compared to garage size then it can fit from any distance from center. In term of process of control, such process with little variation, allows to park car easily in garage and meets the customer”s requirement. Let”s see the above stated example in terms of process capability index ${C_p}$. ${C_p = frac{1}{2}}$ – garage size is smaller than car and can not accomodate your car. ${C_p = 1}$ – garage size is just sufficient for car and can accomodate your car only. ${C_p = 2}$ – garage size is two times than your car and can accomodate two cars at a time. ${C_p = 3}$ – garage size is three times than your car and can accomodate three cars at a time. Process Performance Process performance works to check the conformance of the sample generated using the process. It is expressed as a process performance index ${P_p}$. It checks whether it is meeting customer requirement or not. It varies from Process Capability in the fact that Process Performance is applicable to a particular batch of material. Sampling method may need to be quite substancial to support of the variation in the batch. Process Performance is only to be used when a process control cannot be evaluated. ${P_p}$ is governed by following formula: Formula ${ P_p = frac{USL – LSL}{6 times sigma} }$ Where − ${USL}$ = Upper Specification Limit. ${LSL}$ = Lower Specification Limit. ${sigma}$ = estimated variability of the process, standard deviation. Higher the value of process performance index ${P_p}$, better is the process. Print Page Previous Next Advertisements ”;