Statistics – Power Calculator ”; Previous Next Whenever a hypothesis test is conducted, we need to ascertain that test is of high qualitity. One way to check the power or sensitivity of a test is to compute the probability of test that it can reject the null hypothesis correctly when an alternate hypothesis is correct. In other words, power of a test is the probability of accepting the alternate hypothesis when it is true, where alternative hypothesis detects an effect in the statistical test. $ {Power = P( reject H_0 | H_1 is true) } $ Power of a test is also test by checking the probability of Type I error($ { alpha } $) and of Type II error($ { beta } $) where Type I error represents the incorrect rejection of a valid null hypothesis whereas Type II error represents the incorrect retention of an invalid null hypothesis. Lesser the chances of Type I or Type II error, more is the power of statistical test. Example A survey has been conducted on students to check their IQ level. Suppose a random sample of 16 students is tested. The surveyor tests the null hypothesis that the IQ of student is 100 against the alternative hypothesis that the IQ of student is not 100, using a 0.05 level of significance and standard deviation of 16. What is the power of the hypothesis test if the true population mean were 116? Solution: As distribution of the test statistic under the null hypothesis follows a Student t-distribution. Here n is large, we can approximate the t-distribution by a normal distribution. As probability of committing Type I error($ { alpha } $) is 0.05 , we can reject the null hypothesis ${H_0}$ when the test statistic $ { T ge 1.645 } $. Let”s compute the value of sample mean using test statistics by following formula. $ {T = frac{ bar X – mu}{ frac{sigma}{sqrt mu}} \[7pt] implies bar X = mu + T(frac{sigma}{sqrt mu}) \[7pt] , = 100 + 1.645(frac{16}{sqrt {16}})\[7pt] , = 106.58 } $ Let”s compute the power of statistical test by following formula. $ {Power = P(bar X ge 106.58 where mu = 116 ) \[7pt] , = P( T ge -2.36) \[7pt] , = 1- P( T lt -2.36 ) \[7pt] , = 1 – 0.0091 \[7pt] , = 0.9909 } $ So we have a 99.09% chance of rejecting the null hypothesis ${H_0: mu = 100 } $ in favor of the alternative hypothesis $ {H_1: mu gt 100 } $ where unknown population mean is $ {mu = 116 } $. Print Page Previous Next Advertisements ”;
Category: statistics
Kolmogorov Smirnov Test
Statistics – Kolmogorov Smirnov Test ”; Previous Next This test is used in situations where a comparison has to be made between an observed sample distribution and theoretical distribution. K-S One Sample Test This test is used as a test of goodness of fit and is ideal when the size of the sample is small. It compares the cumulative distribution function for a variable with a specified distribution. The null hypothesis assumes no difference between the observed and theoretical distribution and the value of test statistic ”D” is calculated as: Formula $D = Maximum |F_o(X)-F_r(X)|$ Where − ${F_o(X)}$ = Observed cumulative frequency distribution of a random sample of n observations. and ${F_o(X) = frac{k}{n}}$ = (No.of observations ≤ X)/(Total no.of observations). ${F_r(X)}$ = The theoretical frequency distribution. The critical value of ${D}$ is found from the K-S table values for one sample test. Acceptance Criteria: If calculated value is less than critical value accept null hypothesis. Rejection Criteria: If calculated value is greater than table value reject null hypothesis. Example Problem Statement: In a study done from various streams of a college 60 students, with equal number of students drawn from each stream, are we interviewed and their intention to join the Drama Club of college was noted. B.Sc. B.A. B.Com M.A. M.Com No. in each class 5 9 11 16 19 It was expected that 12 students from each class would join the Drama Club. Using the K-S test to find if there is any difference among student classes with regard to their intention of joining the Drama Club. Solution: ${H_o}$: There is no difference among students of different streams with respect to their intention of joining the drama club. We develop the cumulative frequencies for observed and theoretical distributions. Streams No. of students interested in joining ${F_O(X)}$ ${F_T(X)}$ ${|F_O(X)-F_T(X)|}$ Observed(O) Theoretical(T) B.Sc. 5 12 5/60 12/60 7/60 B.A. 9 12 14/60 24/60 10/60 B.COM. 11 12 25/60 36/60 11/60 M.A. 16 12 41/60 48/60 7/60 M.COM. 19 12 60/40 60/60 60/60 Total n=60 Test statistic ${|D|}$ is calculated as: $D = Maximum {|F_0 (X)-F_T (X)|} \[7pt] , = frac{11}{60} \[7pt] , = 0.183$ The table value of D at 5% significance level is given by ${D_0.05 = frac{1.36}{sqrt{n}}} \[7pt] , = frac{1.36}{sqrt{60}} \[7pt] , = 0.175$ Since the calculated value is greater than the critical value, hence we reject the null hypothesis and conclude that there is a difference among students of different streams in their intention of joining the Club. K-S Two Sample Test When instead of one, there are two independent samples then K-S two sample test can be used to test the agreement between two cumulative distributions. The null hypothesis states that there is no difference between the two distributions. The D-statistic is calculated in the same manner as the K-S One Sample Test. Formula ${D = Maximum |{F_n}_1(X)-{F_n}_2(X)|}$ Where − ${n_1}$ = Observations from first sample. ${n_2}$ = Observations from second sample. It has been seen that when the cumulative distributions show large maximum deviation ${|D|}$ it is indicating towards a difference between the two sample distributions. The critical value of D for samples where ${n_1 = n_2}$ and is ≤ 40, the K-S table for two sample case is used. When ${n_1}$ and/or ${n_2}$ > 40 then the K-S table for large samples of two sample test should be used. The null hypothesis is accepted if the calculated value is less than the table value and vice-versa. Thus use of any of these nonparametric tests helps a researcher to test the significance of his results when the characteristics of the target population are unknown or no assumptions had been made about them. Print Page Previous Next Advertisements ”;
Pie Chart
Statistics – Pie Chart ”; Previous Next A pie chart (or a pie graph) is a circular statistical graphical chart, which is divided into slices in order to explain or illustrate numerical proportions. In a pie chart, centeral angle, area and an arc length of each slice is proportional to the quantity or percentages it represents. Total percentages should be 100 and total of the arc measures should be 360° Following illustration of pie graph depicts the cost of construction of a house. From this graph, one can compare the sum spent on cement, steel and so on. One can also compute the actual sum spent on each individual expense. Consider an example, where we want to know how much more is the labour cost when compared to cost of steel. $ { Amount spent on labor = frac{90}{60} times 600000 = $ 150000 \[7pt] Sum spent on steel = frac{54}{360} times 600000 = $ 90000 \[7pt] Excess = 150000 – 90000 = $ 60000 \[7pt] Let 60000=x% of 600000. \[7pt] implies frac{x}{100} times 600000 = $ 60000. \[7pt] implies x = 10% of total expense. } $ Print Page Previous Next Advertisements ”;
Kurtosis
Statistics – Kurtosis ”; Previous Next The degree of tailedness of a distribution is measured by kurtosis. It tells us the extent to which the distribution is more or less outlier-prone (heavier or light-tailed) than the normal distribution. Three different types of curves, courtesy of Investopedia, are shown as follows − It is difficult to discern different types of kurtosis from the density plots (left panel) because the tails are close to zero for all distributions. But differences in the tails are easy to see in the normal quantile-quantile plots (right panel). The normal curve is called Mesokurtic curve. If the curve of a distribution is more outlier prone (or heavier-tailed) than a normal or mesokurtic curve then it is referred to as a Leptokurtic curve. If a curve is less outlier prone (or lighter-tailed) than a normal curve, it is called as a platykurtic curve. Kurtosis is measured by moments and is given by the following formula − Formula ${beta_2 = frac{mu_4}{mu_2}}$ Where − ${mu_4 = frac{sum(x- bar x)^4}{N}}$ The greater the value of beta_2 the more peaked or leptokurtic the curve. A normal curve has a value of 3, a leptokurtic has beta_2 greater than 3 and platykurtic has beta_2 less then 3. Example Problem Statement: The data on daily wages of 45 workers of a factory are given. Compute beta_1 and beta_2 using moment about the mean. Comment on the results. Wages(Rs.) Number of Workers 100-200 1 120-200 2 140-200 6 160-200 20 180-200 11 200-200 3 220-200 2 Solution: Wages(Rs.) Number of Workers(f) Mid-ptm m-${frac{170}{20}}$ d ${fd}$ ${fd^2}$ ${fd^3}$ ${fd^4}$ 100-200 1 110 -3 -3 9 -27 81 120-200 2 130 -2 -4 8 -16 32 140-200 6 150 -1 -6 6 -6 6 160-200 20 170 0 0 0 0 0 180-200 11 190 1 11 11 11 11 200-200 3 210 2 6 12 24 48 220-200 2 230 3 6 18 54 162 ${N=45}$ ${sum fd = 10}$ ${sum fd^2 = 64}$ ${sum fd^3 = 40}$ ${sum fd^4 = 330}$ Since the deviations have been taken from an assumed mean, hence we first calculate moments about arbitrary origin and then moments about mean. Moments about arbitrary origin ”170” ${mu_1^1= frac{sum fd}{N} times i = frac{10}{45} times 20 = 4.44 \[7pt] mu_2^1= frac{sum fd^2}{N} times i^2 = frac{64}{45} times 20^2 =568.88 \[7pt] mu_3^1= frac{sum fd^2}{N} times i^3 = frac{40}{45} times 20^3 =7111.11 \[7pt] mu_4^1= frac{sum fd^4}{N} times i^4 = frac{330}{45} times 20^4 =1173333.33 }$ Moments about mean ${mu_2 = mu”_2 – (mu”_1 )^2 = 568.88-(4.44)^2 = 549.16 \[7pt] mu_3 = mu”_3 – 3(mu”_1)(mu”_2) + 2(mu”_1)^3 \[7pt] , = 7111.11 – (4.44) (568.88)+ 2(4.44)^3 \[7pt] , = 7111.11 – 7577.48+175.05 = – 291.32 \[7pt] \[7pt] mu_4= mu”_4 – 4(mu”_1)(mu”_3) + 6 (mu_1 )^2 (mu”_2) -3(mu”_1)^4 \[7pt] , = 1173333.33 – 4 (4.44)(7111.11)+6(4.44)^2 (568.88) – 3(4.44)^4 \[7pt] , = 1173333.33 – 126293.31+67288.03-1165.87 \[7pt] , = 1113162.18 }$ From the value of movement about mean, we can now calculate ${beta_1}$ and ${beta_2}$: ${beta_1 = mu^2_3 = frac{(-291.32)^2}{(549.16)^3} = 0.00051 \[7pt] beta_2 = frac{mu_4}{(mu_2)^2} = frac{1113162.18}{(546.16)^2} = 3.69 }$ From the above calculations, it can be concluded that ${beta_1}$, which measures skewness is almost zero, thereby indicating that the distribution is almost symmetrical. ${beta_2}$ Which measures kurtosis, has a value greater than 3, thus implying that the distribution is leptokurtic. Print Page Previous Next Advertisements ”;
Pooled Variance (r)
Statistics – Pooled Variance (r) ”; Previous Next Pooled Variance/Change is the weighted normal for assessing the fluctuations of two autonomous variables where the mean can differ between tests however the genuine difference continues as before. Example Problem Statement: Compute the Pooled Variance of the numbers 1, 2, 3, 4 and 5. Solution: Step 1 Decide the normal (mean) of the given arrangement of information by including every one of the numbers then gap it by the aggregate include of numbers given the information set. ${Mean = frac{1 + 2 + 3 + 4 + 5}{5} = frac{15}{5} = 3 }$ Step 2 At that point, subtract the mean worth with the given numbers in the information set. ${Rightarrow (1 – 3), (2 – 3), (3 – 3), (4 – 3), (5 – 3) Rightarrow – 2, – 1, 0, 1, 2 }$ Step 3 Square every period”s deviation to dodge the negative numbers. ${Rightarrow (- 2)^2, (- 1)^2, (0)^2, (1)^2, (2)^2 Rightarrow 4, 1, 0, 1, 4 }$ Step 4 Now discover Standard Deviation utilizing the underneath equation ${S = sqrt{frac{sum{X-M}^2}{n-1}}}$ Standard Deviation = ${frac{sqrt 10}{sqrt 4} = 1.58113 }$ Step 5 ${Pooled Variance (r) = frac{((aggregate check of numbers – 1) times Var)}{(aggregate tally of numbers – 1)} , \[7pt] (r) = (5 – 1) times frac{2.5}{(5 – 1)}, \[7pt] = frac{(4 times 2.5)}{4} = 2.5}$ Hence, Pooled Variance (r) =2.5 Print Page Previous Next Advertisements ”;
Laplace Distribution
Statistics – Laplace Distribution ”; Previous Next Laplace distribution represents the distribution of differences between two independent variables having identical exponential distributions. It is also called double exponential distribution. Probability density function Probability density function of Laplace distribution is given as: Formula ${ L(x | mu, b) = frac{1}{2b} e^{- frac{| x – mu |}{b}} }$ $ { = frac{1}{2b} } $ $ begin {cases} e^{- frac{x – mu}{b}}, & text{if $x lt mu $} \[7pt] e^{- frac{mu – x}{b}}, & text{if $x ge mu $} end{cases} $ Where − ${mu}$ = location parameter. ${b}$ = scale parameter and is > 0. ${x}$ = random variable. Cumulative distribution function Cumulative distribution function of Laplace distribution is given as: Formula ${ D(x) = int_{- infty}^x}$ $ = begin {cases} frac{1}{2}e^{frac{x – mu}{b}}, & text{if $x lt mu $} \[7pt] 1- frac{1}{2}e^{- frac{x – mu}{b}}, & text{if $x ge mu $} end{cases} $ $ { = frac{1}{2} + frac{1}{2}sgn(x – mu)(1 – e^{- frac{| x – mu |}{b}}) } $ Where − ${mu}$ = location parameter. ${b}$ = scale parameter and is > 0. ${x}$ = random variable. Print Page Previous Next Advertisements ”;
One Proportion Z Test
Statistics – One Proportion Z Test ”; Previous Next The test statistic is a z-score (z) defined by the following equation. ${z = frac{(p – P)}{sigma}}$ where P is the hypothesized value of population proportion in the null hypothesis, p is the sample proportion, and ${sigma}$ is the standard deviation of the sampling distribution. Test Statistics is defined and given by the following function: Formula ${ z = frac {hat p -p_o}{sqrt{frac{p_o(1-p_o)}{n}}} }$ Where − ${z}$ = Test statistics ${n}$ = Sample size ${p_o}$ = Null hypothesized value ${hat p}$ = Observed proportion Example Problem Statement: A survey claims that 9 out of 10 doctors recommend aspirin for their patients with headaches. To test this claim, a random sample of 100 doctors is obtained. Of these 100 doctors, 82 indicate that they recommend aspirin. Is this claim accurate? Use alpha = 0.05. Solution: Define Null and Alternative Hypotheses ${ H_0;p = .90 \[7pt] H_0;p ne .90 }$ Here Alpha = 0.05. Using an alpha of 0.05 with a two-tailed test, we would expect our distribution to look something like this: Here we have 0.025 in each tail. Looking up 1 – 0.025 in our z-table, we find a critical value of 1.96. Thus, our decision rule for this two-tailed test is: If Z is less than -1.96, or greater than 1.96, reject the null hypothesis.Calculate Test Statistic: ${ z = frac {hat p -p_o}{sqrt{frac{p_o(1-p_o)}{n}}} \[7pt] hat p = .82 \[7pt] p_o = .90 \[7pt] n = 100 \[7pt] z_o = frac {.82 – .90}{sqrt{frac{ .90 (1- .90)}{100}}} \[7pt] = frac{-.08}{0.03} \[7pt] = -2.667 }$ As z = -2.667 Thus as result we should reject the null hypothesis and as conclusion, The claim that 9 out of 10 doctors recommend aspirin for their patients is not accurate, z = -2.667, p < 0.05. Print Page Previous Next Advertisements ”;
Multinomial Distribution
Statistics – Multinomial Distribution ”; Previous Next A multinomial experiment is a statistical experiment and it consists of n repeated trials. Each trial has a discrete number of possible outcomes. On any given trial, the probability that a particular outcome will occur is constant. Formula ${P_r = frac{n!}{(n_1!)(n_2!)…(n_x!)} {P_1}^{n_1}{P_2}^{n_2}…{P_x}^{n_x}}$ Where − ${n}$ = number of events ${n_1}$ = number of outcomes, event 1 ${n_2}$ = number of outcomes, event 2 ${n_x}$ = number of outcomes, event x ${P_1}$ = probability that event 1 happens ${P_2}$ = probability that event 2 happens ${P_x}$ = probability that event x happens Example Problem Statement: Three card players play a series of matches. The probability that player A will win any game is 20%, the probability that player B will win is 30%, and the probability player C will win is 50%. If they play 6 games, what is the probability that player A will win 1 game, player B will win 2 games, and player C will win 3? Solution: Given: ${n}$ = 12 (6 games total) ${n_1}$ = 1 (Player A wins) ${n_2}$ = 2 (Player B wins) ${n_3}$ = 3 (Player C wins) ${P_1}$ = 0.20 (probability that Player A wins) ${P_1}$ = 0.30 (probability that Player B wins) ${P_1}$ = 0.50 (probability that Player C wins) Putting the values into the formula, we get: ${ P_r = frac{n!}{(n_1!)(n_2!)…(n_x!)} {P_1}^{n_1}{P_2}^{n_2}…{P_x}^{n_x} , \[7pt] P_r(A=1, B=2, C=3)= frac{6!}{1!2!3!}(0.2^1)(0.3^2)(0.5^3) , \[7pt] = 0.135 }$ Print Page Previous Next Advertisements ”;
Relative Standard Deviation
Statistics – Relative Standard Deviation ”; Previous Next In probability theory and statistics, the coefficient of variation (CV), also known as relative standard deviation (RSD), is a standardized measure of dispersion of a probability distribution or frequency distribution. Relative Standard Deviation, RSD is defined and given by the following probability function: Formula ${100 times frac{s}{bar x}}$ Where − ${s}$ = the sample standard deviation ${bar x}$ = sample mean Example Problem Statement: Find the RSD for the following set of numbers: 49, 51.3, 52.7, 55.8 and the standard deviation are 2.8437065. Solution: Step 1 – Standard deviation of sample: 2.8437065 (or 2.84 rounded to 2 decimal places). Step 2 – Multiply Step 1 by 100. Set this number aside for a moment. ${2.84 times 100 = 284}$ Step 3 – Find the sample mean, ${bar x}$. The sample mean is: ${frac{(49 + 51.3 + 52.7 + 55.8)}{4} = frac{208.8}{4} = 52.2.}$ Step 4Divide Step 2 by the absolute value of Step 3. ${frac{284}{|52.2|} = 5.44.}$ The RSD is: ${52.2 pm 5.4}$% Note that the RSD is expressed as a percentage. Print Page Previous Next Advertisements ”;
Scatterplots
Statistics – Scatterplots ”; Previous Next A scatterplot is a graphical way to display the relationship between two quantitative sample variables. It consists of an X axis, a Y axis and a series of dots where each dot represents one observation from a data set. The position of the dot refers to its X and Y values. Patterns of Data in Scatterplots Scatterplots are used to analyze patterns which generally varies on the basis of linearity, slope, and strength. Linearity – data pattern is either linear/straight or nonlinear/curved. Slope – direction of change in variable Y with respect to increase in value of variable X. If Y increases with increase in X, slope is positive otherwise slope is negative. Strength – Degree of spreadness of scatter in the plot. If dots are widely dispersed, the relationship is consider weak. If dot are densed around a line then the relationship is said to be strong. Print Page Previous Next Advertisements ”;