Statistics – Cumulative plots ”; Previous Next A cumulative plot is a way to draw cumulative information graphically. It displays the number / percentages, or proportion of observations that are less than or equal to particular value. Example Problem Statement: Draw the frequency and comulative frequency plots of 10 student test scores based on following data. Sr. No. Roll No. Test Score 1 100 30 2 101 40 3 102 35 4 103 50 5 104 60 6 105 65 7 105 35 8 105 55 9 105 65 10 105 70 Solution: For the frequency chart, compute the frequecies as shown below. This table show the no. of students scoring in given ranges. Sr. No. Frequency Students 1 30-40 3 2 40-50 1 3 50-60 2 4 60-70 3 4 70-80 1 Following is the required frequency plot For the comulative frequency chart, compute the frequecies as shown below. This table show the no. of students scoring upto given marks(including). Sr. No. Upto Score Students 1 30 1 2 40 3 3 50 4 4 60 7 5 70 10 Following is the required frequency plot Print Page Previous Next Advertisements ”;
Category: Big Data & Analytics
Statistics – Cumulative Poisson Distribution ”; Previous Next ${lambda}$ is the shape parameter which indicates the average number of events in the given time interval. The following is the plot of the Poisson probability density function for four values of ${lambda}$. Cumulative Distribution Function. Formula $${F(x,lambda) = sum_{k=0}^x frac{e^{- lambda} lambda ^x}{k!}}$$ Where − ${e}$ = The base of the natural logarithm equal to 2.71828 ${k}$ = The number of occurrences of an event; the probability of which is given by the function. ${k!}$ = The factorial of k ${lambda}$ = A positive real number, equal to the expected number of occurrences during the given interval Example Problem Statement: A complex software system averages 7 errors per 5,000 lines of code. What is the probability of exactly 2 errors in 5,000 lines of randomly selected lines of code? Solution: The probability of exactly 2 errors in 5,000 lines of randomly selected lines of code is: ${ p(2,7) = frac{e^{-7} 7^2}{2!} = 0.022}$ Print Page Previous Next Advertisements ”;
Statistics – Continuous Uniform Distribution ”; Previous Next The continuous uniform distribution is the probability distribution of random number selection from the continuous interval between a and b. Its density function is defined by the following. Here is a graph of the continuous uniform distribution with a = 1, b = 3. Formula f(x) = begin{cases} 1/(b-a), & text{when $ a le x le b $} \ 0, & text{when $x lt a$ or $x gt b$} end{cases} Example Problem Statement: Suppose you are leading a test and present an inquiry on the crowd of 20 contenders. The time permitted to answer the inquiry is 30 seconds. What number of persons is prone to react inside of 5 seconds? (Regularly, the contenders are required to click a catch of the right decision and the champ is picked on the premise of first snap). Solution: Step 1: The interval of the probability distribution in seconds is [0, 30]. ⇒ The probability density is = 1/30-0=1/30. Step 2: The requirement is how many will respond in 5 seconds. That is, the sub interval of the successful event is [0, 5]. Now the probability P (x < 5) is the proportion of the widths of these two interval. ⇒ 5/30=1/6. Subsequent to there are 20 contenders, the quantity of contenders prone to react in 5 seconds is (1/6) (20) =3. Print Page Previous Next Advertisements ”;
Individual Series Arithmetic Mean ”; Previous Next When data is given on individual basis. Following is an example of individual series − Items 5 10 20 30 40 50 60 70 For individual series, the Arithmetic Mean can be calculated using the following formula. Formula $bar{x} = sum_{i=1}^{n} X_{i}$ Alternatively, we can write same formula as follows − $bar{x} = frac{_{sum {x}}}{N}$ Where − $X_{1}, X_{2}, X_{3}, …. X_{n}$ = individual observation of variable. $sum {x}$ = sum of all observations of the variable ${N}$ = Number of observations Example Problem Statement − Calculate Arithmetic Mean for the following individual data − Items 14 36 45 70 105 Solution − Based on the above mentioned formula, Arithmetic Mean $bar{x}$ will be − $bar{x} = frac{14 + 36 + 45 + 70 + 105}{5} \[7pt] , = frac{270}{5} \[7pt] , = {54}$ The Arithmetic Mean of the given numbers is 54. Calculator Print Page Previous Next Advertisements ”;
Adjusted R-Squared
Statistics – Adjusted R-Squared ”; Previous Next R-squared measures the proportion of the variation in your dependent variable (Y) explained by your independent variables (X) for a linear regression model. Adjusted R-squared adjusts the statistic based on the number of independent variables in the model.${R^2}$ shows how well terms (data points) fit a curve or line. Adjusted ${R^2}$ also indicates how well terms fit a curve or line, but adjusts for the number of terms in a model. If you add more and more useless variables to a model, adjusted r-squared will decrease. If you add more useful variables, adjusted r-squared will increase. Adjusted ${R_{adj}^2}$ will always be less than or equal to ${R^2}$. You only need ${R^2}$ when working with samples. In other words, ${R^2}$ isn”t necessary when you have data from an entire population. Formula ${R_{adj}^2 = 1 – [frac{(1-R^2)(n-1)}{n-k-1}]}$ Where − ${n}$ = the number of points in your data sample. ${k}$ = the number of independent regressors, i.e. the number of variables in your model, excluding the constant. Example Problem Statement − A fund has a sample R-squared value close to 0.5 and it is doubtlessly offering higher risk adjusted returns with the sample size of 50 for 5 predictors. Find Adjusted R square value. Solution − Sample size = 50 Number of predictor = 5 Sample R – square = 0.5.Substitute the qualities in the equation, $ {R_{adj}^2 = 1 – [frac{(1-0.5^2)(50-1)}{50-5-1}] \[7pt] , = 1 – (0.75) times frac{49}{44} , \[7pt] , = 1 – 0.8352 , \[7pt] , = 0.1648 }$ Calculator Print Page Previous Next Advertisements ”;
Chebyshev”s Theorem
Statistics – Chebyshev”s Theorem ”; Previous Next The fraction of any set of numbers lying within k standard deviations of those numbers of the mean of those numbers is at least ${1-frac{1}{k^2}}$ Where − ${k = frac{the within number}{the standard deviation}}$ and ${k}$ must be greater than 1 Example Problem Statement − Use Chebyshev”s theorem to find what percent of the values will fall between 123 and 179 for a data set with mean of 151 and standard deviation of 14. Solution − We subtract 151-123 and get 28, which tells us that 123 is 28 units below the mean. We subtract 179-151 and also get 28, which tells us that 151 is 28 units above the mean. Those two together tell us that the values between 123 and 179 are all within 28 units of the mean. Therefore the “within number” is 28. So we find the number of standard deviations, k, which the “within number”, 28, amounts to by dividing it by the standard deviation − ${k = frac{the within number}{the standard deviation} = frac{28}{14} = 2}$ So now we know that the values between 123 and 179 are all within 28 units of the mean, which is the same as within k=2 standard deviations of the mean. Now, since k > 1 we can use Chebyshev”s formula to find the fraction of the data that are within k=2 standard deviations of the mean. Substituting k=2 we have − ${1-frac{1}{k^2} = 1-frac{1}{2^2} = 1-frac{1}{4} = frac{3}{4}}$ So ${frac{3}{4}}$ of the data lie between 123 and 179. And since ${frac{3}{4} = 75}$% that implies that 75% of the data values are between 123 and 179. Calculator Print Page Previous Next Advertisements ”;
Arithmetic Mode
Statistics – Arithmetic Mode ”; Previous Next Arithmetic Mode refers to the most frequently occurring value in the data set. In other words, modal value has the highest frequency associated with it. It is denoted by the symbol ${M_o}$ or Mode. We”re going to discuss methods to compute the Arithmetic Mode for three types of series: Individual Data Series Discrete Data Series Continuous Data Series Individual Data Series When data is given on individual basis. Following is an example of individual series: Items 5 10 20 30 40 50 60 70 Discrete Data Series When data is given alongwith their frequencies. Following is an example of discrete series: Items 5 10 20 30 40 50 60 70 Frequency 2 5 1 3 12 0 5 7 Continuous Data Series When data is given based on ranges alongwith their frequencies. Following is an example of continous series: Items 0-5 5-10 10-20 20-30 30-40 Frequency 2 5 1 3 12 Print Page Previous Next Advertisements ”;
Chi-squared Distribution
Statistics – Chi-squared Distribution ”; Previous Next The chi-squared distribution (chi-square or ${X^2}$ – distribution) with degrees of freedom, k is the distribution of a sum of the squares of k independent standard normal random variables. It is one of the most widely used probability distributions in statistics. It is a special case of the gamma distribution. Chi-squared distribution is widely used by statisticians to compute the following: Estimation of Confidence interval for a population standard deviation of a normal distribution using a sample standard deviation. To check independence of two criteria of classification of multiple qualitative variables. To check the relationships between categorical variables. To study the sample variance where the underlying distribution is normal. To test deviations of differences between expected and observed frequencies. To conduct a The chi-square test (a goodness of fit test). Probability density function Probability density function of Chi-Square distribution is given as: Formula ${ f(x; k ) = } $ $ begin {cases} frac{x^{ frac{k}{2} – 1} e^{-frac{x}{2}}}{2^{frac{k}{2}}Gamma(frac{k}{2})}, & text{if $x gt 0 $} \[7pt] 0, & text{if $x le 0 $} end{cases} $ Where − ${Gamma(frac{k}{2})}$ = Gamma function having closed form values for integer parameter k. ${x}$ = random variable. ${k}$ = integer parameter. Cumulative distribution function Cumulative distribution function of Chi-Square distribution is given as: Formula ${ F(x; k) = frac{gamma(frac{x}{2}, frac{k}{2})}{Gamma(frac{k}{2})}\[7pt] = P (frac{x}{2}, frac{k}{2}) }$ Where − ${gamma(s,t)}$ = lower incomplete gamma function. ${P(s,t)}$ = regularized gamma function. ${x}$ = random variable. ${k}$ = integer parameter. Print Page Previous Next Advertisements ”;
F Test Table
Statistics – F Test Table ”; Previous Next F-test is named after the more prominent analyst R.A. Fisher. F-test is utilized to test whether the two autonomous appraisals of populace change contrast altogether or whether the two examples may be viewed as drawn from the typical populace having the same difference. For doing the test, we calculate F-statistic is defined as: Formula ${F} = frac{Larger estimate of population variance}{smaller estimate of population variance} = frac{{S_1}^2}{{S_2}^2} where {{S_1}^2} gt {{S_2}^2}$ Procedure Its testing procedure is as follows: Set up null hypothesis that the two population variance are equal. i.e. ${H_0: {sigma_1}^2 = {sigma_2}^2}$ The variances of the random samples are calculated by using formula: ${S_1^2} = frac{sum(X_1- bar X_1)^2}{n_1-1}, \[7pt] {S_2^2} = frac{sum(X_2- bar X_2)^2}{n_2-1}$ The variance ratio F is computed as: ${F} = frac{{S_1}^2}{{S_2}^2} where {{S_1}^2} gt {{S_2}^2}$ The degrees of freedom are computed. The degrees of freedom of the larger estimate of the population variance are denoted by v1 and the smaller estimate by v2. That is, ${v_1}$ = degrees of freedom for sample having larger variance = ${n_1-1}$ ${v_2}$ = degrees of freedom for sample having smaller variance = ${n_2-1}$ Then from the F-table given at the end of the book, the value of ${F}$ is found for ${v_1}$ and ${v_2}$ with 5% level of significance. Then we compare the calculated value of ${F}$ with the table value of ${F_.05}$ for ${v_1}$ and ${v_2}$ degrees of freedom. If the calculated value of ${F}$ exceeds the table value of ${F}$, we reject the null hypothesis and conclude that the difference between the two variances is significant. On the other hand, if the calculated value of ${F}$ is less than the table value, the null hypothesis is accepted and concludes that both the samples illustrate the applications of F-test. Example Problem Statement: In a sample of 8 observations, the entirety of squared deviations of things from the mean was 94.5. In another specimen of 10 perceptions, the worth was observed to be 101.7 Test whether the distinction is huge at 5% level. (You are given that at 5% level of centrality, the basic estimation of ${F}$ for ${v_1}$ = 7 and ${v_2}$ = 9, ${F_.05}$ is 3.29). Solution: Let us take the hypothesis that the difference in the variances of the two samples is not significant i.e. ${H_0: {sigma_1}^2 = {sigma_2}^2}$ We are given the following: ${n_1} = 8 , {sum {(X_1 – bar X_1)}^2} = 94.5, {n_2} = 10, {sum {(X_2 – bar X_2)}^2} = 101.7, \[7pt] {S_1^2} = frac{sum(X_1- bar X_1)^2}{n_1-1} = frac {94.5}{8-1} = frac {94.5}{7} = {13.5}, \[7pt] {S_2^2} = frac{sum(X_2- bar X_2)^2}{n_2-1} = frac {101.7}{10-1} = frac {101.7}{9} = {11.3}$ Applying F-Test ${F} = frac{{S_1}^2}{{S_2}^2} = frac {13.5}{11.3} = {1.195}$ For ${v_1}$ = 8-1 = 7, ${v_2}$ = 10-1 = 9 and ${F_.05}$ = 3.29. The Calculated value of ${F}$ is less than the table value. Hence, we accept the null hypothesis and conclude that the difference in the variances of two samples is not significant at 5% level. Print Page Previous Next Advertisements ”;
Chi Squared table
Statistics – Chi Squared table ”; Previous Next The numbers in the table represent the values of the ${chi^2}$ statistics. Areas of the shaded region (A) are the column indexes. You can also use the Chi-Square Distribution to compute critical and p values exactly. df A=0.005 0.010 0.025 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.975 0.99 0.995 1 0.000039 0.00016 0.00098 0.0039 0.0158 0.102 0.455 1.32 2.71 3.84 5.02 6.63 7.88 2 0.0100 0.0201 0.0506 0.103 0.211 0.575 1.39 2.77 4.61 5.99 7.38 9.21 10.6 3 0.0717 0.115 0.216 0.352 0.584 1.21 2.37 4.11 6.25 7.81 9.35 11.3 12.8 4 0.207 0.297 0.484 0.711 1.06 1.92 3.36 5.39 7.78 9.49 11.1 13.3 14.9 5 0.412 0.554 0.831 1.15 1.61 2.67 4.35 6.63 9.24 11.1 12.8 15.1 16.7 6 0.676 0.872 1.24 1.64 2.20 3.45 5.35 7.84 10.6 12.6 14.4 16.8 18.5 7 0.989 1.24 1.69 2.17 2.83 4.25 6.35 9.04 12.0 14.1 16.0 18.5 20.3 8 1.34 1.65 2.18 2.73 3.49 5.07 7.34 10.2 13.4 15.5 17.5 20.1 22.0 9 1.73 2.09 2.70 3.33 4.17 5.9 8.34 11.4 14.7 16.9 19.0 21.7 23.6 10 2.16 2.56 3.25 3.94 4.87 6.74 9.34 12.5 16.0 18.3 20.5 23.2 25.2 11 2.60 3.05 3.82 4.57 5.58 7.58 10.3 13.7 17.3 19.7 21.9 24.7 26.8 12 3.07 3.57 4.40 5.23 6.30 8.44 11.3 14.8 18.5 21.0 23.3 26.2 28.3 13 3.57 4.11 5.01 5.89 7.04 9.3 12.3 16.0 19.8 22.4 24.7 27.7 29.8 14 4.07 4.66 5.63 6.57 7.79 10.2 13.3 17.1 21.1 23.7 26.1 29.1 31.3 15 4.60 5.23 6.26 7.26 8.55 11.0 14.3 18.2 22.3 25.0 27.5 30.6 32.8 16 5.14 5.81 6.91 7.96 9.31 11.9 15.3 19.4 23.5 26.3 28.8 32.0 34.3 17 5.70 6.41 7.56 8.67 10.1 12.8 16.3 20.5 24.8 27.6 30.2 33.4 35.7 18 6.26 7.01 8.23 9.39 10.9 13.7 17.3 21.6 26.0 28.9 31.5 34.8 37.2 19 6.84 7.63 8.91 10.1 11.7 14.6 18.3 22.7 27.2 30.1 32.9 36.2 38.6 20 7.43 8.26 9.59 10.9 12.4 15.5 19.3 23.8 28.4 31.4 34.2 37.6 40.0 21 8.03 8.90 10.3 11.6 13.2 16.3 20.3 24.9 29.6 32.7 35.5 38.9 41.4 22 8.64 9.54 11.0 12.3 14.0 17.2 21.3 26.0 30.8 33.9 36.8 40.3 42.8 23 9.26 10.2 11.7 13.1 14.8 18.1 22.3 27.1 32.0 35.2 38.1 41.6 44.2 24 9.89 10.9 12.4 13.8 15.7 19.0 23.3 28.2 33.2 36.4 39.4 43.0 45.6 25 10.5 11.5 13.1 14.6 16.5 19.9 24.3 29.3 34.4 37.7 40.6 44.3 46.9 26 11.2 12.2 13.8 15.4 17.3 20.8 25.3 30.4 35.6 38.9 41.9 45.6 48.3 27 11.8 12.9 14.6 16.2 18.1 21.7 26.3 31.5 36.7 40.1 43.2 47.0 49.6 28 12.5 13.6 15.3 16.9 18.9 22.7 27.3 32.6 37.9 41.3 44.5 48.3 51.0 29 13.1 14.3 16.0 17.7 19.8 23.6 28.3 33.7 39.1 42.6 45.7 49.6 52.3 30 13.8 15.0 16.8 18.5 20.6 24.5 29.3 34.8 40.3 43.8 47.0 50.9 53.7 31 14.5 15.7 17.5 19.3 21.4 25.4 30.3 35.9 41.4 45.0 48.2 52.2 55.0 32 15.1 16.4 18.3 20.1 22.3 26.3 31.3 37.0 42.6 46.2 49.5 53.5 56.3 33 15.8 17.1 19.0 20.9 23.1 27.2 32.3 38.1 43.7 47.4 50.7 54.8 57.6 34 16.5 17.8 19.8 21.7 24.0 28.1 33.3 39.1 44.9 48.6 52.0 56.1 59.0 35 17.2 18.5 20.6 22.5 24.8 29.1 34.3 40.2 46.1 49.8 53.2 57.3 60.3 36 17.9 19.2 21.3 23.3 25.6 30.0 35.3 41.3 47.2 51.0 54.4 58.6 61.6 37 18.6 20.0 22.1 24.1 26.5 30.9 36.3 42.4 48.4 52.2 55.7 59.9 62.9 38 19.3 20.7 22.9 24.9 27.3 31.8 37.3 43.5 49.5 53.4 56.9 61.2 64.2 39 20.0 21.4 23.7 25.7 28.2 32.7 38.3 44.5 50.7 54.6 58.1 62.4 65.5 40 20.7 22.2 24.4 26.5 29.1 33.7 39.3 45.6 51.8 55.8 59.3 63.7 66.8 41 21.4 22.9 25.2 27.3 29.9 34.6 40.3 46.7 52.9 56.9 60.6 65.0 68.1 42 22.1 23.7 26.0 28.1 30.8 35.5 41.3 47.8 54.1 58.1 61.8 66.2 69.3 43 22.9 24.4 26.8 29.0 31.6 36.4 42.3 48.8 55.2 59.3 63.0 67.5 70.6 44 23.6 25.1 27.6 29.8 32.5 37.4 43.3 49.9 56.4 60.5 64.2 68.7 71.9 45 24.3 25.9 28.4 30.6 33.4 38.3 44.3 51.0 57.5 61.7 65.4 70.0 73.2 df A=0.005 0.010 0.025 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.975 0.99 0.995 Print Page Previous Next Advertisements ”;