Statistics – Data collection – Case Study Method ”; Previous Next Case study research is a qualitative research method that is used to examine contemporary real-life situations and apply the findings of the case to the problem under study. Case studies involve a detailed contextual analysis of a limited number of events or conditions and their relationships. It provides the basis for the application of ideas and extension of methods. It helps a researcher to understand a complex issue or object and add strength to what is already known through previous research. STEPS OF CASE STUDY METHOD In order to ensure objectivity and clarity, a researcher should adopt a methodical approach to case studies research. The following steps can be followed: Identify and define the research questions – The researcher starts with establishing the focus of the study by identifying the research object and the problem surrounding it. The research object would be a person, a program, an event or an entity. Select the cases – In this step the researcher decides on the number of cases to choose (single or multiple), the type of cases to choose (unique or typical) and the approach to collect, store and analyze the data. This is the design phase of the case study method. Collect the data – The researcher now collects the data with the objective of gathering multiple sources of evidence with reference to the problem under study. This evidence is stored comprehensively and systematically in a format that can be referenced and sorted easily so that converging lines of inquiry and patterns can be uncovered. Evaluate and analyze the data – In this step the researcher makes use of varied methods to analyze qualitative as well as quantitative data. The data is categorized, tabulated and cross checked to address the initial propositions or purpose of the study. Graphic techniques like placing information into arrays, creating matrices of categories, creating flow charts etc. are used to help the investigators to approach the data from different ways and thus avoid making premature conclusions. Multiple investigators may also be used to examine the data so that a wide variety of insights to the available data can be developed. Presentation of Results – The results are presented in a manner that allows the reader to evaluate the findings in the light of the evidence presented in the report. The results are corroborated with sufficient evidence showing that all aspects of the problem have been adequately explored. The newer insights gained and the conflicting propositions that have emerged are suitably highlighted in the report. Print Page Previous Next Advertisements ”;
Category: Big Data & Analytics
Data Patterns
Statistics – Data Patterns ”; Previous Next Data patterns are very useful when they are drawn graphically. Data patterns commonly described in terms of features like center, spread, shape, and other unusual properties. Other special descriptive labels are symmetric, bell-shaped, skewed, etc. Center The center of a distribution, graphically, is located at the median of the distribution. Such a graphic chart displays that almost half of the observations are on either side. Height of each column indicates the frequency of observations. Spread The spread of a distribution refers to the variation of the data. If the set of observation covers a wide range, the spread is larger. If the observations are centered around a single value, then the spread is smaller. Shape The shape of a distribution can described using following characteristics. Symmetry – In symmetric distribution, graph can be divided at the center in such a way that each half is a mirror image of the other. Number of peaks. – Distributions with one or multiple peaks. Distribution with one clear peak is known as unimodal, and distribution with two clear peaks is called bimodal. A single peak symmetric distribution at the center, is referred to as bell-shaped. Skewness – Some distributions may have multiple observations on one side of the graph than the other side. Distributions having fewer observations towards lower values are said to be skewed right; and distributions with fewer observations towards lower values are said to be skewed left. Uniform – When the set of observations has no peak and have data equally spread across the range of the distribution, then the distribution is called a uniform distribution. Unusual Features Common unusual features of data patterns are gaps and outliers. Gaps – Gaps points to areas of a distribution having no observations. Following figure has a gap as there are no observations in the middle of the distribution. Outliers – Distributions may be characterized by extreme values that differ greatly from the other set of observation data. These extreme values are refered as outliers. Following figure illustrates a distribution with an outlier. Print Page Previous Next Advertisements ”;
Permutation
Statistics – Permutation ”; Previous Next A permutation is an arrangement of all or part of a set of objects, with regard to the order of the arrangement. For example, suppose we have a set of three letters: A, B, and C. we might ask how many ways we can arrange 2 letters from that set. Permutation is defined and given by the following function: Formula ${^nP_r = frac{n!}{(n-r)!} }$ Where − ${n}$ = of the set from which elements are permuted. ${r}$ = size of each permutation. ${n,r}$ are non negative integers. Example Problem Statement: A computer scientist is trying to discover the keyword for a financial account. If the keyword consists only of 10 lower case characters (e.g., 10 characters from among the set: a, b, c… w, x, y, z) and no character can be repeated, how many different unique arrangements of characters exist? Solution: Step 1: Determine whether the question pertains to permutations or combinations. Since changing the order of the potential keywords (e.g., ajk vs. kja) would create a new possibility, this is a permutations problem. Step 2: Determine n and r n = 26 since the computer scientist is choosing from 26 possibilities (e.g., a, b, c… x, y, z). r = 10 since the computer scientist is choosing 10 characters. Step 2: Apply the formula ${^{26}P_{10} = frac{26!}{(26-10)!} \[7pt] = frac{26!}{16!} \[7pt] = frac{26(25)(24)…(11)(10)(9)…(1)}{(16)(15)…(1)} \[7pt] = 26(25)(24)…(17) \[7pt] = 19275223968000 }$ Print Page Previous Next Advertisements ”;
Chebyshev”s Theorem
Statistics – Chebyshev”s Theorem ”; Previous Next The fraction of any set of numbers lying within k standard deviations of those numbers of the mean of those numbers is at least ${1-frac{1}{k^2}}$ Where − ${k = frac{the within number}{the standard deviation}}$ and ${k}$ must be greater than 1 Example Problem Statement − Use Chebyshev”s theorem to find what percent of the values will fall between 123 and 179 for a data set with mean of 151 and standard deviation of 14. Solution − We subtract 151-123 and get 28, which tells us that 123 is 28 units below the mean. We subtract 179-151 and also get 28, which tells us that 151 is 28 units above the mean. Those two together tell us that the values between 123 and 179 are all within 28 units of the mean. Therefore the “within number” is 28. So we find the number of standard deviations, k, which the “within number”, 28, amounts to by dividing it by the standard deviation − ${k = frac{the within number}{the standard deviation} = frac{28}{14} = 2}$ So now we know that the values between 123 and 179 are all within 28 units of the mean, which is the same as within k=2 standard deviations of the mean. Now, since k > 1 we can use Chebyshev”s formula to find the fraction of the data that are within k=2 standard deviations of the mean. Substituting k=2 we have − ${1-frac{1}{k^2} = 1-frac{1}{2^2} = 1-frac{1}{4} = frac{3}{4}}$ So ${frac{3}{4}}$ of the data lie between 123 and 179. And since ${frac{3}{4} = 75}$% that implies that 75% of the data values are between 123 and 179. Calculator Print Page Previous Next Advertisements ”;
Arithmetic Mode
Statistics – Arithmetic Mode ”; Previous Next Arithmetic Mode refers to the most frequently occurring value in the data set. In other words, modal value has the highest frequency associated with it. It is denoted by the symbol ${M_o}$ or Mode. We”re going to discuss methods to compute the Arithmetic Mode for three types of series: Individual Data Series Discrete Data Series Continuous Data Series Individual Data Series When data is given on individual basis. Following is an example of individual series: Items 5 10 20 30 40 50 60 70 Discrete Data Series When data is given alongwith their frequencies. Following is an example of discrete series: Items 5 10 20 30 40 50 60 70 Frequency 2 5 1 3 12 0 5 7 Continuous Data Series When data is given based on ranges alongwith their frequencies. Following is an example of continous series: Items 0-5 5-10 10-20 20-30 30-40 Frequency 2 5 1 3 12 Print Page Previous Next Advertisements ”;
Chi-squared Distribution
Statistics – Chi-squared Distribution ”; Previous Next The chi-squared distribution (chi-square or ${X^2}$ – distribution) with degrees of freedom, k is the distribution of a sum of the squares of k independent standard normal random variables. It is one of the most widely used probability distributions in statistics. It is a special case of the gamma distribution. Chi-squared distribution is widely used by statisticians to compute the following: Estimation of Confidence interval for a population standard deviation of a normal distribution using a sample standard deviation. To check independence of two criteria of classification of multiple qualitative variables. To check the relationships between categorical variables. To study the sample variance where the underlying distribution is normal. To test deviations of differences between expected and observed frequencies. To conduct a The chi-square test (a goodness of fit test). Probability density function Probability density function of Chi-Square distribution is given as: Formula ${ f(x; k ) = } $ $ begin {cases} frac{x^{ frac{k}{2} – 1} e^{-frac{x}{2}}}{2^{frac{k}{2}}Gamma(frac{k}{2})}, & text{if $x gt 0 $} \[7pt] 0, & text{if $x le 0 $} end{cases} $ Where − ${Gamma(frac{k}{2})}$ = Gamma function having closed form values for integer parameter k. ${x}$ = random variable. ${k}$ = integer parameter. Cumulative distribution function Cumulative distribution function of Chi-Square distribution is given as: Formula ${ F(x; k) = frac{gamma(frac{x}{2}, frac{k}{2})}{Gamma(frac{k}{2})}\[7pt] = P (frac{x}{2}, frac{k}{2}) }$ Where − ${gamma(s,t)}$ = lower incomplete gamma function. ${P(s,t)}$ = regularized gamma function. ${x}$ = random variable. ${k}$ = integer parameter. Print Page Previous Next Advertisements ”;
F Test Table
Statistics – F Test Table ”; Previous Next F-test is named after the more prominent analyst R.A. Fisher. F-test is utilized to test whether the two autonomous appraisals of populace change contrast altogether or whether the two examples may be viewed as drawn from the typical populace having the same difference. For doing the test, we calculate F-statistic is defined as: Formula ${F} = frac{Larger estimate of population variance}{smaller estimate of population variance} = frac{{S_1}^2}{{S_2}^2} where {{S_1}^2} gt {{S_2}^2}$ Procedure Its testing procedure is as follows: Set up null hypothesis that the two population variance are equal. i.e. ${H_0: {sigma_1}^2 = {sigma_2}^2}$ The variances of the random samples are calculated by using formula: ${S_1^2} = frac{sum(X_1- bar X_1)^2}{n_1-1}, \[7pt] {S_2^2} = frac{sum(X_2- bar X_2)^2}{n_2-1}$ The variance ratio F is computed as: ${F} = frac{{S_1}^2}{{S_2}^2} where {{S_1}^2} gt {{S_2}^2}$ The degrees of freedom are computed. The degrees of freedom of the larger estimate of the population variance are denoted by v1 and the smaller estimate by v2. That is, ${v_1}$ = degrees of freedom for sample having larger variance = ${n_1-1}$ ${v_2}$ = degrees of freedom for sample having smaller variance = ${n_2-1}$ Then from the F-table given at the end of the book, the value of ${F}$ is found for ${v_1}$ and ${v_2}$ with 5% level of significance. Then we compare the calculated value of ${F}$ with the table value of ${F_.05}$ for ${v_1}$ and ${v_2}$ degrees of freedom. If the calculated value of ${F}$ exceeds the table value of ${F}$, we reject the null hypothesis and conclude that the difference between the two variances is significant. On the other hand, if the calculated value of ${F}$ is less than the table value, the null hypothesis is accepted and concludes that both the samples illustrate the applications of F-test. Example Problem Statement: In a sample of 8 observations, the entirety of squared deviations of things from the mean was 94.5. In another specimen of 10 perceptions, the worth was observed to be 101.7 Test whether the distinction is huge at 5% level. (You are given that at 5% level of centrality, the basic estimation of ${F}$ for ${v_1}$ = 7 and ${v_2}$ = 9, ${F_.05}$ is 3.29). Solution: Let us take the hypothesis that the difference in the variances of the two samples is not significant i.e. ${H_0: {sigma_1}^2 = {sigma_2}^2}$ We are given the following: ${n_1} = 8 , {sum {(X_1 – bar X_1)}^2} = 94.5, {n_2} = 10, {sum {(X_2 – bar X_2)}^2} = 101.7, \[7pt] {S_1^2} = frac{sum(X_1- bar X_1)^2}{n_1-1} = frac {94.5}{8-1} = frac {94.5}{7} = {13.5}, \[7pt] {S_2^2} = frac{sum(X_2- bar X_2)^2}{n_2-1} = frac {101.7}{10-1} = frac {101.7}{9} = {11.3}$ Applying F-Test ${F} = frac{{S_1}^2}{{S_2}^2} = frac {13.5}{11.3} = {1.195}$ For ${v_1}$ = 8-1 = 7, ${v_2}$ = 10-1 = 9 and ${F_.05}$ = 3.29. The Calculated value of ${F}$ is less than the table value. Hence, we accept the null hypothesis and conclude that the difference in the variances of two samples is not significant at 5% level. Print Page Previous Next Advertisements ”;
Chi Squared table
Statistics – Chi Squared table ”; Previous Next The numbers in the table represent the values of the ${chi^2}$ statistics. Areas of the shaded region (A) are the column indexes. You can also use the Chi-Square Distribution to compute critical and p values exactly. df A=0.005 0.010 0.025 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.975 0.99 0.995 1 0.000039 0.00016 0.00098 0.0039 0.0158 0.102 0.455 1.32 2.71 3.84 5.02 6.63 7.88 2 0.0100 0.0201 0.0506 0.103 0.211 0.575 1.39 2.77 4.61 5.99 7.38 9.21 10.6 3 0.0717 0.115 0.216 0.352 0.584 1.21 2.37 4.11 6.25 7.81 9.35 11.3 12.8 4 0.207 0.297 0.484 0.711 1.06 1.92 3.36 5.39 7.78 9.49 11.1 13.3 14.9 5 0.412 0.554 0.831 1.15 1.61 2.67 4.35 6.63 9.24 11.1 12.8 15.1 16.7 6 0.676 0.872 1.24 1.64 2.20 3.45 5.35 7.84 10.6 12.6 14.4 16.8 18.5 7 0.989 1.24 1.69 2.17 2.83 4.25 6.35 9.04 12.0 14.1 16.0 18.5 20.3 8 1.34 1.65 2.18 2.73 3.49 5.07 7.34 10.2 13.4 15.5 17.5 20.1 22.0 9 1.73 2.09 2.70 3.33 4.17 5.9 8.34 11.4 14.7 16.9 19.0 21.7 23.6 10 2.16 2.56 3.25 3.94 4.87 6.74 9.34 12.5 16.0 18.3 20.5 23.2 25.2 11 2.60 3.05 3.82 4.57 5.58 7.58 10.3 13.7 17.3 19.7 21.9 24.7 26.8 12 3.07 3.57 4.40 5.23 6.30 8.44 11.3 14.8 18.5 21.0 23.3 26.2 28.3 13 3.57 4.11 5.01 5.89 7.04 9.3 12.3 16.0 19.8 22.4 24.7 27.7 29.8 14 4.07 4.66 5.63 6.57 7.79 10.2 13.3 17.1 21.1 23.7 26.1 29.1 31.3 15 4.60 5.23 6.26 7.26 8.55 11.0 14.3 18.2 22.3 25.0 27.5 30.6 32.8 16 5.14 5.81 6.91 7.96 9.31 11.9 15.3 19.4 23.5 26.3 28.8 32.0 34.3 17 5.70 6.41 7.56 8.67 10.1 12.8 16.3 20.5 24.8 27.6 30.2 33.4 35.7 18 6.26 7.01 8.23 9.39 10.9 13.7 17.3 21.6 26.0 28.9 31.5 34.8 37.2 19 6.84 7.63 8.91 10.1 11.7 14.6 18.3 22.7 27.2 30.1 32.9 36.2 38.6 20 7.43 8.26 9.59 10.9 12.4 15.5 19.3 23.8 28.4 31.4 34.2 37.6 40.0 21 8.03 8.90 10.3 11.6 13.2 16.3 20.3 24.9 29.6 32.7 35.5 38.9 41.4 22 8.64 9.54 11.0 12.3 14.0 17.2 21.3 26.0 30.8 33.9 36.8 40.3 42.8 23 9.26 10.2 11.7 13.1 14.8 18.1 22.3 27.1 32.0 35.2 38.1 41.6 44.2 24 9.89 10.9 12.4 13.8 15.7 19.0 23.3 28.2 33.2 36.4 39.4 43.0 45.6 25 10.5 11.5 13.1 14.6 16.5 19.9 24.3 29.3 34.4 37.7 40.6 44.3 46.9 26 11.2 12.2 13.8 15.4 17.3 20.8 25.3 30.4 35.6 38.9 41.9 45.6 48.3 27 11.8 12.9 14.6 16.2 18.1 21.7 26.3 31.5 36.7 40.1 43.2 47.0 49.6 28 12.5 13.6 15.3 16.9 18.9 22.7 27.3 32.6 37.9 41.3 44.5 48.3 51.0 29 13.1 14.3 16.0 17.7 19.8 23.6 28.3 33.7 39.1 42.6 45.7 49.6 52.3 30 13.8 15.0 16.8 18.5 20.6 24.5 29.3 34.8 40.3 43.8 47.0 50.9 53.7 31 14.5 15.7 17.5 19.3 21.4 25.4 30.3 35.9 41.4 45.0 48.2 52.2 55.0 32 15.1 16.4 18.3 20.1 22.3 26.3 31.3 37.0 42.6 46.2 49.5 53.5 56.3 33 15.8 17.1 19.0 20.9 23.1 27.2 32.3 38.1 43.7 47.4 50.7 54.8 57.6 34 16.5 17.8 19.8 21.7 24.0 28.1 33.3 39.1 44.9 48.6 52.0 56.1 59.0 35 17.2 18.5 20.6 22.5 24.8 29.1 34.3 40.2 46.1 49.8 53.2 57.3 60.3 36 17.9 19.2 21.3 23.3 25.6 30.0 35.3 41.3 47.2 51.0 54.4 58.6 61.6 37 18.6 20.0 22.1 24.1 26.5 30.9 36.3 42.4 48.4 52.2 55.7 59.9 62.9 38 19.3 20.7 22.9 24.9 27.3 31.8 37.3 43.5 49.5 53.4 56.9 61.2 64.2 39 20.0 21.4 23.7 25.7 28.2 32.7 38.3 44.5 50.7 54.6 58.1 62.4 65.5 40 20.7 22.2 24.4 26.5 29.1 33.7 39.3 45.6 51.8 55.8 59.3 63.7 66.8 41 21.4 22.9 25.2 27.3 29.9 34.6 40.3 46.7 52.9 56.9 60.6 65.0 68.1 42 22.1 23.7 26.0 28.1 30.8 35.5 41.3 47.8 54.1 58.1 61.8 66.2 69.3 43 22.9 24.4 26.8 29.0 31.6 36.4 42.3 48.8 55.2 59.3 63.0 67.5 70.6 44 23.6 25.1 27.6 29.8 32.5 37.4 43.3 49.9 56.4 60.5 64.2 68.7 71.9 45 24.3 25.9 28.4 30.6 33.4 38.3 44.3 51.0 57.5 61.7 65.4 70.0 73.2 df A=0.005 0.010 0.025 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.975 0.99 0.995 Print Page Previous Next Advertisements ”;
Combination with replacement
Statistics – Combination with replacement ”; Previous Next Each of several possible ways in which a set or number of things can be ordered or arranged is called permutation Combination with replacement in probability is selecting an object from an unordered list multiple times. Combination with replacement is defined and given by the following probability function − Formula ${^nC_r = frac{(n+r-1)!}{r!(n-1)!} }$ Where − ${n}$ = number of items which can be selected. ${r}$ = number of items which are selected. ${^nC_r}$ = Unordered list of items or combinations Example Problem Statement − There are five kinds of frozen yogurt: banana, chocolate, lemon, strawberry and vanilla. You can have three scoops. What number of varieties will there be? Solution − Here n = 5 and r = 3. Substitute the values in formula, ${^nC_r = frac{(n+r-1)!}{r!(n-1)!} \[7pt] = frac{(5+3+1)!}{3!(5-1)!} \[7pt] = frac{7!}{3!4!} \[7pt] = frac{5040}{6 times 24} \[7pt] = 35}$ Calculator Print Page Previous Next Advertisements ”;
Central limit theorem
Statistics – Central limit theorem ”; Previous Next If the population from which the sample has a been drawn is a normal population then the sample means would be equal to population mean and the sampling distribution would be normal. When the more population is skewed, as is the case illustrated in Figure, then the sampling distribution would tend to move closer to the normal distribution, provided the sample is large (i.e. greater then 30). According to Central Limit Theorem, for sufficiently large samples with size greater than 30, the shape of the sampling distribution will become more and more like a normal distribution, irrespective of the shape of the parent population. This theorem explains the relationship between the population distribution and sampling distribution. It highlights the fact that if there are large enough set of samples then the sampling distribution of mean approaches normal distribution. The importance of central limit theorem has been summed up by Richard. I. Levin in the following words: The significance of the central limit theorem lies in the fact that it permits us to use sample statistics to make inferences about population parameters without knowing anything about the shape of the frequency distribution of that population other than what we can get from the sample. Print Page Previous Next Advertisements ”;