Statistics – Cluster sampling ”; Previous Next In cluster sampling, groups of elements that ideally speaking, are heterogeneous in nature within group, and are chosen randomly. Unlike stratified sampling where groups are homogeneous and few elements are randomly chosen from each group, in cluster sampling the group with intra group heterogeneity are developed and all the elements within the group become a pan of the sample. Whereas stratified sampling has intra group homogeneity and inter group heterogeneity, cluster sampling has intra group heterogeneity. Examples One stage cluster sampling A committee comprising of number of members from different departments has a high degree of heterogeneity. When from number of such committees, few are chosen randomly, and then it is a case of one stage cluster sampling. Two stage cluster sampling If from each cluster which has been randomly chosen, few elements are chosen randomly using simple random sampling or any other probability method then it is a two stage cluster sampling. Multi-stage cluster sampling A cluster sample can be a multiple stage sampling, when the choice of element in a sample involves selection at multiple stages e.g. if in a national survey on insurance products a sample of insurance companies is to be drawn, then it requires developing clusters at multiple stages. In the first stage the clusters are formed on the basis of public and private companies. At the next stage a group of companies is chosen randomly from each cluster developed earlier. In the third stage the office location of each chosen company from where data is to be collected is chosen randomly. Thus in multistage sampling, probability sampling of primary units is done, then from each primary unit a sample of secondary sampling units is drawn and then the third levels till we reach the final stage of breakdown for the sample units. Print Page Previous Next Advertisements ”;
Category: Big Data & Analytics
Statistics – Data collection – Case Study Method ”; Previous Next Case study research is a qualitative research method that is used to examine contemporary real-life situations and apply the findings of the case to the problem under study. Case studies involve a detailed contextual analysis of a limited number of events or conditions and their relationships. It provides the basis for the application of ideas and extension of methods. It helps a researcher to understand a complex issue or object and add strength to what is already known through previous research. STEPS OF CASE STUDY METHOD In order to ensure objectivity and clarity, a researcher should adopt a methodical approach to case studies research. The following steps can be followed: Identify and define the research questions – The researcher starts with establishing the focus of the study by identifying the research object and the problem surrounding it. The research object would be a person, a program, an event or an entity. Select the cases – In this step the researcher decides on the number of cases to choose (single or multiple), the type of cases to choose (unique or typical) and the approach to collect, store and analyze the data. This is the design phase of the case study method. Collect the data – The researcher now collects the data with the objective of gathering multiple sources of evidence with reference to the problem under study. This evidence is stored comprehensively and systematically in a format that can be referenced and sorted easily so that converging lines of inquiry and patterns can be uncovered. Evaluate and analyze the data – In this step the researcher makes use of varied methods to analyze qualitative as well as quantitative data. The data is categorized, tabulated and cross checked to address the initial propositions or purpose of the study. Graphic techniques like placing information into arrays, creating matrices of categories, creating flow charts etc. are used to help the investigators to approach the data from different ways and thus avoid making premature conclusions. Multiple investigators may also be used to examine the data so that a wide variety of insights to the available data can be developed. Presentation of Results – The results are presented in a manner that allows the reader to evaluate the findings in the light of the evidence presented in the report. The results are corroborated with sufficient evidence showing that all aspects of the problem have been adequately explored. The newer insights gained and the conflicting propositions that have emerged are suitably highlighted in the report. Print Page Previous Next Advertisements ”;
Data Patterns
Statistics – Data Patterns ”; Previous Next Data patterns are very useful when they are drawn graphically. Data patterns commonly described in terms of features like center, spread, shape, and other unusual properties. Other special descriptive labels are symmetric, bell-shaped, skewed, etc. Center The center of a distribution, graphically, is located at the median of the distribution. Such a graphic chart displays that almost half of the observations are on either side. Height of each column indicates the frequency of observations. Spread The spread of a distribution refers to the variation of the data. If the set of observation covers a wide range, the spread is larger. If the observations are centered around a single value, then the spread is smaller. Shape The shape of a distribution can described using following characteristics. Symmetry – In symmetric distribution, graph can be divided at the center in such a way that each half is a mirror image of the other. Number of peaks. – Distributions with one or multiple peaks. Distribution with one clear peak is known as unimodal, and distribution with two clear peaks is called bimodal. A single peak symmetric distribution at the center, is referred to as bell-shaped. Skewness – Some distributions may have multiple observations on one side of the graph than the other side. Distributions having fewer observations towards lower values are said to be skewed right; and distributions with fewer observations towards lower values are said to be skewed left. Uniform – When the set of observations has no peak and have data equally spread across the range of the distribution, then the distribution is called a uniform distribution. Unusual Features Common unusual features of data patterns are gaps and outliers. Gaps – Gaps points to areas of a distribution having no observations. Following figure has a gap as there are no observations in the middle of the distribution. Outliers – Distributions may be characterized by extreme values that differ greatly from the other set of observation data. These extreme values are refered as outliers. Following figure illustrates a distribution with an outlier. Print Page Previous Next Advertisements ”;
Adjusted R-Squared
Statistics – Adjusted R-Squared ”; Previous Next R-squared measures the proportion of the variation in your dependent variable (Y) explained by your independent variables (X) for a linear regression model. Adjusted R-squared adjusts the statistic based on the number of independent variables in the model.${R^2}$ shows how well terms (data points) fit a curve or line. Adjusted ${R^2}$ also indicates how well terms fit a curve or line, but adjusts for the number of terms in a model. If you add more and more useless variables to a model, adjusted r-squared will decrease. If you add more useful variables, adjusted r-squared will increase. Adjusted ${R_{adj}^2}$ will always be less than or equal to ${R^2}$. You only need ${R^2}$ when working with samples. In other words, ${R^2}$ isn”t necessary when you have data from an entire population. Formula ${R_{adj}^2 = 1 – [frac{(1-R^2)(n-1)}{n-k-1}]}$ Where − ${n}$ = the number of points in your data sample. ${k}$ = the number of independent regressors, i.e. the number of variables in your model, excluding the constant. Example Problem Statement − A fund has a sample R-squared value close to 0.5 and it is doubtlessly offering higher risk adjusted returns with the sample size of 50 for 5 predictors. Find Adjusted R square value. Solution − Sample size = 50 Number of predictor = 5 Sample R – square = 0.5.Substitute the qualities in the equation, $ {R_{adj}^2 = 1 – [frac{(1-0.5^2)(50-1)}{50-5-1}] \[7pt] , = 1 – (0.75) times frac{49}{44} , \[7pt] , = 1 – 0.8352 , \[7pt] , = 0.1648 }$ Calculator Print Page Previous Next Advertisements ”;
Chebyshev”s Theorem
Statistics – Chebyshev”s Theorem ”; Previous Next The fraction of any set of numbers lying within k standard deviations of those numbers of the mean of those numbers is at least ${1-frac{1}{k^2}}$ Where − ${k = frac{the within number}{the standard deviation}}$ and ${k}$ must be greater than 1 Example Problem Statement − Use Chebyshev”s theorem to find what percent of the values will fall between 123 and 179 for a data set with mean of 151 and standard deviation of 14. Solution − We subtract 151-123 and get 28, which tells us that 123 is 28 units below the mean. We subtract 179-151 and also get 28, which tells us that 151 is 28 units above the mean. Those two together tell us that the values between 123 and 179 are all within 28 units of the mean. Therefore the “within number” is 28. So we find the number of standard deviations, k, which the “within number”, 28, amounts to by dividing it by the standard deviation − ${k = frac{the within number}{the standard deviation} = frac{28}{14} = 2}$ So now we know that the values between 123 and 179 are all within 28 units of the mean, which is the same as within k=2 standard deviations of the mean. Now, since k > 1 we can use Chebyshev”s formula to find the fraction of the data that are within k=2 standard deviations of the mean. Substituting k=2 we have − ${1-frac{1}{k^2} = 1-frac{1}{2^2} = 1-frac{1}{4} = frac{3}{4}}$ So ${frac{3}{4}}$ of the data lie between 123 and 179. And since ${frac{3}{4} = 75}$% that implies that 75% of the data values are between 123 and 179. Calculator Print Page Previous Next Advertisements ”;
Arithmetic Mode
Statistics – Arithmetic Mode ”; Previous Next Arithmetic Mode refers to the most frequently occurring value in the data set. In other words, modal value has the highest frequency associated with it. It is denoted by the symbol ${M_o}$ or Mode. We”re going to discuss methods to compute the Arithmetic Mode for three types of series: Individual Data Series Discrete Data Series Continuous Data Series Individual Data Series When data is given on individual basis. Following is an example of individual series: Items 5 10 20 30 40 50 60 70 Discrete Data Series When data is given alongwith their frequencies. Following is an example of discrete series: Items 5 10 20 30 40 50 60 70 Frequency 2 5 1 3 12 0 5 7 Continuous Data Series When data is given based on ranges alongwith their frequencies. Following is an example of continous series: Items 0-5 5-10 10-20 20-30 30-40 Frequency 2 5 1 3 12 Print Page Previous Next Advertisements ”;
Chi-squared Distribution
Statistics – Chi-squared Distribution ”; Previous Next The chi-squared distribution (chi-square or ${X^2}$ – distribution) with degrees of freedom, k is the distribution of a sum of the squares of k independent standard normal random variables. It is one of the most widely used probability distributions in statistics. It is a special case of the gamma distribution. Chi-squared distribution is widely used by statisticians to compute the following: Estimation of Confidence interval for a population standard deviation of a normal distribution using a sample standard deviation. To check independence of two criteria of classification of multiple qualitative variables. To check the relationships between categorical variables. To study the sample variance where the underlying distribution is normal. To test deviations of differences between expected and observed frequencies. To conduct a The chi-square test (a goodness of fit test). Probability density function Probability density function of Chi-Square distribution is given as: Formula ${ f(x; k ) = } $ $ begin {cases} frac{x^{ frac{k}{2} – 1} e^{-frac{x}{2}}}{2^{frac{k}{2}}Gamma(frac{k}{2})}, & text{if $x gt 0 $} \[7pt] 0, & text{if $x le 0 $} end{cases} $ Where − ${Gamma(frac{k}{2})}$ = Gamma function having closed form values for integer parameter k. ${x}$ = random variable. ${k}$ = integer parameter. Cumulative distribution function Cumulative distribution function of Chi-Square distribution is given as: Formula ${ F(x; k) = frac{gamma(frac{x}{2}, frac{k}{2})}{Gamma(frac{k}{2})}\[7pt] = P (frac{x}{2}, frac{k}{2}) }$ Where − ${gamma(s,t)}$ = lower incomplete gamma function. ${P(s,t)}$ = regularized gamma function. ${x}$ = random variable. ${k}$ = integer parameter. Print Page Previous Next Advertisements ”;
F Test Table
Statistics – F Test Table ”; Previous Next F-test is named after the more prominent analyst R.A. Fisher. F-test is utilized to test whether the two autonomous appraisals of populace change contrast altogether or whether the two examples may be viewed as drawn from the typical populace having the same difference. For doing the test, we calculate F-statistic is defined as: Formula ${F} = frac{Larger estimate of population variance}{smaller estimate of population variance} = frac{{S_1}^2}{{S_2}^2} where {{S_1}^2} gt {{S_2}^2}$ Procedure Its testing procedure is as follows: Set up null hypothesis that the two population variance are equal. i.e. ${H_0: {sigma_1}^2 = {sigma_2}^2}$ The variances of the random samples are calculated by using formula: ${S_1^2} = frac{sum(X_1- bar X_1)^2}{n_1-1}, \[7pt] {S_2^2} = frac{sum(X_2- bar X_2)^2}{n_2-1}$ The variance ratio F is computed as: ${F} = frac{{S_1}^2}{{S_2}^2} where {{S_1}^2} gt {{S_2}^2}$ The degrees of freedom are computed. The degrees of freedom of the larger estimate of the population variance are denoted by v1 and the smaller estimate by v2. That is, ${v_1}$ = degrees of freedom for sample having larger variance = ${n_1-1}$ ${v_2}$ = degrees of freedom for sample having smaller variance = ${n_2-1}$ Then from the F-table given at the end of the book, the value of ${F}$ is found for ${v_1}$ and ${v_2}$ with 5% level of significance. Then we compare the calculated value of ${F}$ with the table value of ${F_.05}$ for ${v_1}$ and ${v_2}$ degrees of freedom. If the calculated value of ${F}$ exceeds the table value of ${F}$, we reject the null hypothesis and conclude that the difference between the two variances is significant. On the other hand, if the calculated value of ${F}$ is less than the table value, the null hypothesis is accepted and concludes that both the samples illustrate the applications of F-test. Example Problem Statement: In a sample of 8 observations, the entirety of squared deviations of things from the mean was 94.5. In another specimen of 10 perceptions, the worth was observed to be 101.7 Test whether the distinction is huge at 5% level. (You are given that at 5% level of centrality, the basic estimation of ${F}$ for ${v_1}$ = 7 and ${v_2}$ = 9, ${F_.05}$ is 3.29). Solution: Let us take the hypothesis that the difference in the variances of the two samples is not significant i.e. ${H_0: {sigma_1}^2 = {sigma_2}^2}$ We are given the following: ${n_1} = 8 , {sum {(X_1 – bar X_1)}^2} = 94.5, {n_2} = 10, {sum {(X_2 – bar X_2)}^2} = 101.7, \[7pt] {S_1^2} = frac{sum(X_1- bar X_1)^2}{n_1-1} = frac {94.5}{8-1} = frac {94.5}{7} = {13.5}, \[7pt] {S_2^2} = frac{sum(X_2- bar X_2)^2}{n_2-1} = frac {101.7}{10-1} = frac {101.7}{9} = {11.3}$ Applying F-Test ${F} = frac{{S_1}^2}{{S_2}^2} = frac {13.5}{11.3} = {1.195}$ For ${v_1}$ = 8-1 = 7, ${v_2}$ = 10-1 = 9 and ${F_.05}$ = 3.29. The Calculated value of ${F}$ is less than the table value. Hence, we accept the null hypothesis and conclude that the difference in the variances of two samples is not significant at 5% level. Print Page Previous Next Advertisements ”;
Chi Squared table
Statistics – Chi Squared table ”; Previous Next The numbers in the table represent the values of the ${chi^2}$ statistics. Areas of the shaded region (A) are the column indexes. You can also use the Chi-Square Distribution to compute critical and p values exactly. df A=0.005 0.010 0.025 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.975 0.99 0.995 1 0.000039 0.00016 0.00098 0.0039 0.0158 0.102 0.455 1.32 2.71 3.84 5.02 6.63 7.88 2 0.0100 0.0201 0.0506 0.103 0.211 0.575 1.39 2.77 4.61 5.99 7.38 9.21 10.6 3 0.0717 0.115 0.216 0.352 0.584 1.21 2.37 4.11 6.25 7.81 9.35 11.3 12.8 4 0.207 0.297 0.484 0.711 1.06 1.92 3.36 5.39 7.78 9.49 11.1 13.3 14.9 5 0.412 0.554 0.831 1.15 1.61 2.67 4.35 6.63 9.24 11.1 12.8 15.1 16.7 6 0.676 0.872 1.24 1.64 2.20 3.45 5.35 7.84 10.6 12.6 14.4 16.8 18.5 7 0.989 1.24 1.69 2.17 2.83 4.25 6.35 9.04 12.0 14.1 16.0 18.5 20.3 8 1.34 1.65 2.18 2.73 3.49 5.07 7.34 10.2 13.4 15.5 17.5 20.1 22.0 9 1.73 2.09 2.70 3.33 4.17 5.9 8.34 11.4 14.7 16.9 19.0 21.7 23.6 10 2.16 2.56 3.25 3.94 4.87 6.74 9.34 12.5 16.0 18.3 20.5 23.2 25.2 11 2.60 3.05 3.82 4.57 5.58 7.58 10.3 13.7 17.3 19.7 21.9 24.7 26.8 12 3.07 3.57 4.40 5.23 6.30 8.44 11.3 14.8 18.5 21.0 23.3 26.2 28.3 13 3.57 4.11 5.01 5.89 7.04 9.3 12.3 16.0 19.8 22.4 24.7 27.7 29.8 14 4.07 4.66 5.63 6.57 7.79 10.2 13.3 17.1 21.1 23.7 26.1 29.1 31.3 15 4.60 5.23 6.26 7.26 8.55 11.0 14.3 18.2 22.3 25.0 27.5 30.6 32.8 16 5.14 5.81 6.91 7.96 9.31 11.9 15.3 19.4 23.5 26.3 28.8 32.0 34.3 17 5.70 6.41 7.56 8.67 10.1 12.8 16.3 20.5 24.8 27.6 30.2 33.4 35.7 18 6.26 7.01 8.23 9.39 10.9 13.7 17.3 21.6 26.0 28.9 31.5 34.8 37.2 19 6.84 7.63 8.91 10.1 11.7 14.6 18.3 22.7 27.2 30.1 32.9 36.2 38.6 20 7.43 8.26 9.59 10.9 12.4 15.5 19.3 23.8 28.4 31.4 34.2 37.6 40.0 21 8.03 8.90 10.3 11.6 13.2 16.3 20.3 24.9 29.6 32.7 35.5 38.9 41.4 22 8.64 9.54 11.0 12.3 14.0 17.2 21.3 26.0 30.8 33.9 36.8 40.3 42.8 23 9.26 10.2 11.7 13.1 14.8 18.1 22.3 27.1 32.0 35.2 38.1 41.6 44.2 24 9.89 10.9 12.4 13.8 15.7 19.0 23.3 28.2 33.2 36.4 39.4 43.0 45.6 25 10.5 11.5 13.1 14.6 16.5 19.9 24.3 29.3 34.4 37.7 40.6 44.3 46.9 26 11.2 12.2 13.8 15.4 17.3 20.8 25.3 30.4 35.6 38.9 41.9 45.6 48.3 27 11.8 12.9 14.6 16.2 18.1 21.7 26.3 31.5 36.7 40.1 43.2 47.0 49.6 28 12.5 13.6 15.3 16.9 18.9 22.7 27.3 32.6 37.9 41.3 44.5 48.3 51.0 29 13.1 14.3 16.0 17.7 19.8 23.6 28.3 33.7 39.1 42.6 45.7 49.6 52.3 30 13.8 15.0 16.8 18.5 20.6 24.5 29.3 34.8 40.3 43.8 47.0 50.9 53.7 31 14.5 15.7 17.5 19.3 21.4 25.4 30.3 35.9 41.4 45.0 48.2 52.2 55.0 32 15.1 16.4 18.3 20.1 22.3 26.3 31.3 37.0 42.6 46.2 49.5 53.5 56.3 33 15.8 17.1 19.0 20.9 23.1 27.2 32.3 38.1 43.7 47.4 50.7 54.8 57.6 34 16.5 17.8 19.8 21.7 24.0 28.1 33.3 39.1 44.9 48.6 52.0 56.1 59.0 35 17.2 18.5 20.6 22.5 24.8 29.1 34.3 40.2 46.1 49.8 53.2 57.3 60.3 36 17.9 19.2 21.3 23.3 25.6 30.0 35.3 41.3 47.2 51.0 54.4 58.6 61.6 37 18.6 20.0 22.1 24.1 26.5 30.9 36.3 42.4 48.4 52.2 55.7 59.9 62.9 38 19.3 20.7 22.9 24.9 27.3 31.8 37.3 43.5 49.5 53.4 56.9 61.2 64.2 39 20.0 21.4 23.7 25.7 28.2 32.7 38.3 44.5 50.7 54.6 58.1 62.4 65.5 40 20.7 22.2 24.4 26.5 29.1 33.7 39.3 45.6 51.8 55.8 59.3 63.7 66.8 41 21.4 22.9 25.2 27.3 29.9 34.6 40.3 46.7 52.9 56.9 60.6 65.0 68.1 42 22.1 23.7 26.0 28.1 30.8 35.5 41.3 47.8 54.1 58.1 61.8 66.2 69.3 43 22.9 24.4 26.8 29.0 31.6 36.4 42.3 48.8 55.2 59.3 63.0 67.5 70.6 44 23.6 25.1 27.6 29.8 32.5 37.4 43.3 49.9 56.4 60.5 64.2 68.7 71.9 45 24.3 25.9 28.4 30.6 33.4 38.3 44.3 51.0 57.5 61.7 65.4 70.0 73.2 df A=0.005 0.010 0.025 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.975 0.99 0.995 Print Page Previous Next Advertisements ”;
Combination with replacement
Statistics – Combination with replacement ”; Previous Next Each of several possible ways in which a set or number of things can be ordered or arranged is called permutation Combination with replacement in probability is selecting an object from an unordered list multiple times. Combination with replacement is defined and given by the following probability function − Formula ${^nC_r = frac{(n+r-1)!}{r!(n-1)!} }$ Where − ${n}$ = number of items which can be selected. ${r}$ = number of items which are selected. ${^nC_r}$ = Unordered list of items or combinations Example Problem Statement − There are five kinds of frozen yogurt: banana, chocolate, lemon, strawberry and vanilla. You can have three scoops. What number of varieties will there be? Solution − Here n = 5 and r = 3. Substitute the values in formula, ${^nC_r = frac{(n+r-1)!}{r!(n-1)!} \[7pt] = frac{(5+3+1)!}{3!(5-1)!} \[7pt] = frac{7!}{3!4!} \[7pt] = frac{5040}{6 times 24} \[7pt] = 35}$ Calculator Print Page Previous Next Advertisements ”;