Poisson Distribution

Statistics – Poisson Distribution ”; Previous Next Poisson conveyance is discrete likelihood dispersion and it is broadly use in measurable work. This conveyance was produced by a French Mathematician Dr. Simon Denis Poisson in 1837 and the dissemination is named after him. The Poisson circulation is utilized as a part of those circumstances where the happening”s likelihood of an occasion is little, i.e., the occasion once in a while happens. For instance, the likelihood of faulty things in an assembling organization is little, the likelihood of happening tremor in a year is little, the mischance”s likelihood on a street is little, and so forth. All these are cases of such occasions where the likelihood of event is little. Poisson distribution is defined and given by the following probability function: Formula ${P(X-x)} = {e^{-m}}.frac{m^x}{x!}$ Where − ${m}$ = Probability of success. ${P(X-x)}$ = Probability of x successes. Example Problem Statement: A producer of pins realized that on a normal 5% of his item is faulty. He offers pins in a parcel of 100 and insurances that not more than 4 pins will be flawed. What is the likelihood that a bundle will meet the ensured quality? [Given: ${e^{-m}} = 0.0067$] Solution: Let p = probability of a defective pin = 5% = $frac{5}{100}$. We are given: ${n} = 100, {p} = frac{5}{100} , \[7pt] Rightarrow {np} = 100 times frac{5}{100} = {5}$ The Poisson distribution is given as: ${P(X-x)} = {e^{-m}}.frac{m^x}{x!}$ Required probability = P [packet will meet the guarantee] = P [packet contains up to 4 defectives] = P (0) +P (1) +P (2) +P (3) +P (4) $ = {e^{-5}}.frac{5^0}{0!} + {e^{-5}}.frac{5^1}{1!} + {e^{-5}}.frac{5^2}{2!} + {e^{-5}}.frac{5^3}{3!} +{e^{-5}}.frac{5^4}{4!}, \[7pt] = {e^{-5}}[1+frac{5}{1}+frac{25}{2}+frac{125}{6}+frac{625}{24}] , \[7pt] = 0.0067 times 65.374 = 0.438$ Print Page Previous Next Advertisements ”;

Power Calculator

Statistics – Power Calculator ”; Previous Next Whenever a hypothesis test is conducted, we need to ascertain that test is of high qualitity. One way to check the power or sensitivity of a test is to compute the probability of test that it can reject the null hypothesis correctly when an alternate hypothesis is correct. In other words, power of a test is the probability of accepting the alternate hypothesis when it is true, where alternative hypothesis detects an effect in the statistical test. $ {Power = P( reject H_0 | H_1 is true) } $ Power of a test is also test by checking the probability of Type I error($ { alpha } $) and of Type II error($ { beta } $) where Type I error represents the incorrect rejection of a valid null hypothesis whereas Type II error represents the incorrect retention of an invalid null hypothesis. Lesser the chances of Type I or Type II error, more is the power of statistical test. Example A survey has been conducted on students to check their IQ level. Suppose a random sample of 16 students is tested. The surveyor tests the null hypothesis that the IQ of student is 100 against the alternative hypothesis that the IQ of student is not 100, using a 0.05 level of significance and standard deviation of 16. What is the power of the hypothesis test if the true population mean were 116? Solution: As distribution of the test statistic under the null hypothesis follows a Student t-distribution. Here n is large, we can approximate the t-distribution by a normal distribution. As probability of committing Type I error($ { alpha } $) is 0.05 , we can reject the null hypothesis ${H_0}$ when the test statistic $ { T ge 1.645 } $. Let”s compute the value of sample mean using test statistics by following formula. $ {T = frac{ bar X – mu}{ frac{sigma}{sqrt mu}} \[7pt] implies bar X = mu + T(frac{sigma}{sqrt mu}) \[7pt] , = 100 + 1.645(frac{16}{sqrt {16}})\[7pt] , = 106.58 } $ Let”s compute the power of statistical test by following formula. $ {Power = P(bar X ge 106.58 where mu = 116 ) \[7pt] , = P( T ge -2.36) \[7pt] , = 1- P( T lt -2.36 ) \[7pt] , = 1 – 0.0091 \[7pt] , = 0.9909 } $ So we have a 99.09% chance of rejecting the null hypothesis ${H_0: mu = 100 } $ in favor of the alternative hypothesis $ {H_1: mu gt 100 } $ where unknown population mean is $ {mu = 116 } $. Print Page Previous Next Advertisements ”;

Kolmogorov Smirnov Test

Statistics – Kolmogorov Smirnov Test ”; Previous Next This test is used in situations where a comparison has to be made between an observed sample distribution and theoretical distribution. K-S One Sample Test This test is used as a test of goodness of fit and is ideal when the size of the sample is small. It compares the cumulative distribution function for a variable with a specified distribution. The null hypothesis assumes no difference between the observed and theoretical distribution and the value of test statistic ”D” is calculated as: Formula $D = Maximum |F_o(X)-F_r(X)|$ Where − ${F_o(X)}$ = Observed cumulative frequency distribution of a random sample of n observations. and ${F_o(X) = frac{k}{n}}$ = (No.of observations ≤ X)/(Total no.of observations). ${F_r(X)}$ = The theoretical frequency distribution. The critical value of ${D}$ is found from the K-S table values for one sample test. Acceptance Criteria: If calculated value is less than critical value accept null hypothesis. Rejection Criteria: If calculated value is greater than table value reject null hypothesis. Example Problem Statement: In a study done from various streams of a college 60 students, with equal number of students drawn from each stream, are we interviewed and their intention to join the Drama Club of college was noted.   B.Sc. B.A. B.Com M.A. M.Com No. in each class 5 9 11 16 19 It was expected that 12 students from each class would join the Drama Club. Using the K-S test to find if there is any difference among student classes with regard to their intention of joining the Drama Club. Solution: ${H_o}$: There is no difference among students of different streams with respect to their intention of joining the drama club. We develop the cumulative frequencies for observed and theoretical distributions. Streams No. of students interested in joining ${F_O(X)}$ ${F_T(X)}$ ${|F_O(X)-F_T(X)|}$   Observed(O) Theoretical(T)       B.Sc. 5 12 5/60 12/60 7/60 B.A. 9 12 14/60 24/60 10/60 B.COM. 11 12 25/60 36/60 11/60 M.A. 16 12 41/60 48/60 7/60 M.COM. 19 12 60/40 60/60 60/60 Total n=60         Test statistic ${|D|}$ is calculated as: $D = Maximum {|F_0 (X)-F_T (X)|} \[7pt] , = frac{11}{60} \[7pt] , = 0.183$ The table value of D at 5% significance level is given by ${D_0.05 = frac{1.36}{sqrt{n}}} \[7pt] , = frac{1.36}{sqrt{60}} \[7pt] , = 0.175$ Since the calculated value is greater than the critical value, hence we reject the null hypothesis and conclude that there is a difference among students of different streams in their intention of joining the Club. K-S Two Sample Test When instead of one, there are two independent samples then K-S two sample test can be used to test the agreement between two cumulative distributions. The null hypothesis states that there is no difference between the two distributions. The D-statistic is calculated in the same manner as the K-S One Sample Test. Formula ${D = Maximum |{F_n}_1(X)-{F_n}_2(X)|}$ Where − ${n_1}$ = Observations from first sample. ${n_2}$ = Observations from second sample. It has been seen that when the cumulative distributions show large maximum deviation ${|D|}$ it is indicating towards a difference between the two sample distributions. The critical value of D for samples where ${n_1 = n_2}$ and is ≤ 40, the K-S table for two sample case is used. When ${n_1}$ and/or ${n_2}$ > 40 then the K-S table for large samples of two sample test should be used. The null hypothesis is accepted if the calculated value is less than the table value and vice-versa. Thus use of any of these nonparametric tests helps a researcher to test the significance of his results when the characteristics of the target population are unknown or no assumptions had been made about them. Print Page Previous Next Advertisements ”;

Pie Chart

Statistics – Pie Chart ”; Previous Next A pie chart (or a pie graph) is a circular statistical graphical chart, which is divided into slices in order to explain or illustrate numerical proportions. In a pie chart, centeral angle, area and an arc length of each slice is proportional to the quantity or percentages it represents. Total percentages should be 100 and total of the arc measures should be 360° Following illustration of pie graph depicts the cost of construction of a house. From this graph, one can compare the sum spent on cement, steel and so on. One can also compute the actual sum spent on each individual expense. Consider an example, where we want to know how much more is the labour cost when compared to cost of steel. $ { Amount spent on labor = frac{90}{60} times 600000 = $ 150000 \[7pt] Sum spent on steel = frac{54}{360} times 600000 = $ 90000 \[7pt] Excess = 150000 – 90000 = $ 60000 \[7pt] Let 60000=x% of 600000. \[7pt] implies frac{x}{100} times 600000 = $ 60000. \[7pt] implies x = 10% of total expense. } $ Print Page Previous Next Advertisements ”;

Kurtosis

Statistics – Kurtosis ”; Previous Next The degree of tailedness of a distribution is measured by kurtosis. It tells us the extent to which the distribution is more or less outlier-prone (heavier or light-tailed) than the normal distribution. Three different types of curves, courtesy of Investopedia, are shown as follows − It is difficult to discern different types of kurtosis from the density plots (left panel) because the tails are close to zero for all distributions. But differences in the tails are easy to see in the normal quantile-quantile plots (right panel). The normal curve is called Mesokurtic curve. If the curve of a distribution is more outlier prone (or heavier-tailed) than a normal or mesokurtic curve then it is referred to as a Leptokurtic curve. If a curve is less outlier prone (or lighter-tailed) than a normal curve, it is called as a platykurtic curve. Kurtosis is measured by moments and is given by the following formula − Formula ${beta_2 = frac{mu_4}{mu_2}}$ Where − ${mu_4 = frac{sum(x- bar x)^4}{N}}$ The greater the value of beta_2 the more peaked or leptokurtic the curve. A normal curve has a value of 3, a leptokurtic has beta_2 greater than 3 and platykurtic has beta_2 less then 3. Example Problem Statement: The data on daily wages of 45 workers of a factory are given. Compute beta_1 and beta_2 using moment about the mean. Comment on the results. Wages(Rs.) Number of Workers 100-200 1 120-200 2 140-200 6 160-200 20 180-200 11 200-200 3 220-200 2 Solution: Wages(Rs.) Number of Workers(f) Mid-ptm m-${frac{170}{20}}$ d ${fd}$ ${fd^2}$ ${fd^3}$ ${fd^4}$ 100-200 1 110 -3 -3 9 -27 81 120-200 2 130 -2 -4 8 -16 32 140-200 6 150 -1 -6 6 -6 6 160-200 20 170 0 0 0 0 0 180-200 11 190 1 11 11 11 11 200-200 3 210 2 6 12 24 48 220-200 2 230 3 6 18 54 162   ${N=45}$     ${sum fd = 10}$ ${sum fd^2 = 64}$ ${sum fd^3 = 40}$ ${sum fd^4 = 330}$ Since the deviations have been taken from an assumed mean, hence we first calculate moments about arbitrary origin and then moments about mean. Moments about arbitrary origin ”170” ${mu_1^1= frac{sum fd}{N} times i = frac{10}{45} times 20 = 4.44 \[7pt] mu_2^1= frac{sum fd^2}{N} times i^2 = frac{64}{45} times 20^2 =568.88 \[7pt] mu_3^1= frac{sum fd^2}{N} times i^3 = frac{40}{45} times 20^3 =7111.11 \[7pt] mu_4^1= frac{sum fd^4}{N} times i^4 = frac{330}{45} times 20^4 =1173333.33 }$ Moments about mean ${mu_2 = mu”_2 – (mu”_1 )^2 = 568.88-(4.44)^2 = 549.16 \[7pt] mu_3 = mu”_3 – 3(mu”_1)(mu”_2) + 2(mu”_1)^3 \[7pt] , = 7111.11 – (4.44) (568.88)+ 2(4.44)^3 \[7pt] , = 7111.11 – 7577.48+175.05 = – 291.32 \[7pt] \[7pt] mu_4= mu”_4 – 4(mu”_1)(mu”_3) + 6 (mu_1 )^2 (mu”_2) -3(mu”_1)^4 \[7pt] , = 1173333.33 – 4 (4.44)(7111.11)+6(4.44)^2 (568.88) – 3(4.44)^4 \[7pt] , = 1173333.33 – 126293.31+67288.03-1165.87 \[7pt] , = 1113162.18 }$ From the value of movement about mean, we can now calculate ${beta_1}$ and ${beta_2}$: ${beta_1 = mu^2_3 = frac{(-291.32)^2}{(549.16)^3} = 0.00051 \[7pt] beta_2 = frac{mu_4}{(mu_2)^2} = frac{1113162.18}{(546.16)^2} = 3.69 }$ From the above calculations, it can be concluded that ${beta_1}$, which measures skewness is almost zero, thereby indicating that the distribution is almost symmetrical. ${beta_2}$ Which measures kurtosis, has a value greater than 3, thus implying that the distribution is leptokurtic. Print Page Previous Next Advertisements ”;

Pooled Variance (r)

Statistics – Pooled Variance (r) ”; Previous Next Pooled Variance/Change is the weighted normal for assessing the fluctuations of two autonomous variables where the mean can differ between tests however the genuine difference continues as before. Example Problem Statement: Compute the Pooled Variance of the numbers 1, 2, 3, 4 and 5. Solution: Step 1 Decide the normal (mean) of the given arrangement of information by including every one of the numbers then gap it by the aggregate include of numbers given the information set. ${Mean = frac{1 + 2 + 3 + 4 + 5}{5} = frac{15}{5} = 3 }$ Step 2 At that point, subtract the mean worth with the given numbers in the information set. ${Rightarrow (1 – 3), (2 – 3), (3 – 3), (4 – 3), (5 – 3) Rightarrow – 2, – 1, 0, 1, 2 }$ Step 3 Square every period”s deviation to dodge the negative numbers. ${Rightarrow (- 2)^2, (- 1)^2, (0)^2, (1)^2, (2)^2 Rightarrow 4, 1, 0, 1, 4 }$ Step 4 Now discover Standard Deviation utilizing the underneath equation ${S = sqrt{frac{sum{X-M}^2}{n-1}}}$ Standard Deviation = ${frac{sqrt 10}{sqrt 4} = 1.58113 }$ Step 5 ${Pooled Variance (r) = frac{((aggregate check of numbers – 1) times Var)}{(aggregate tally of numbers – 1)} , \[7pt] (r) = (5 – 1) times frac{2.5}{(5 – 1)}, \[7pt] = frac{(4 times 2.5)}{4} = 2.5}$ Hence, Pooled Variance (r) =2.5 Print Page Previous Next Advertisements ”;

Laplace Distribution

Statistics – Laplace Distribution ”; Previous Next Laplace distribution represents the distribution of differences between two independent variables having identical exponential distributions. It is also called double exponential distribution. Probability density function Probability density function of Laplace distribution is given as: Formula ${ L(x | mu, b) = frac{1}{2b} e^{- frac{| x – mu |}{b}} }$ $ { = frac{1}{2b} } $ $ begin {cases} e^{- frac{x – mu}{b}}, & text{if $x lt mu $} \[7pt] e^{- frac{mu – x}{b}}, & text{if $x ge mu $} end{cases} $ Where − ${mu}$ = location parameter. ${b}$ = scale parameter and is > 0. ${x}$ = random variable. Cumulative distribution function Cumulative distribution function of Laplace distribution is given as: Formula ${ D(x) = int_{- infty}^x}$ $ = begin {cases} frac{1}{2}e^{frac{x – mu}{b}}, & text{if $x lt mu $} \[7pt] 1- frac{1}{2}e^{- frac{x – mu}{b}}, & text{if $x ge mu $} end{cases} $ $ { = frac{1}{2} + frac{1}{2}sgn(x – mu)(1 – e^{- frac{| x – mu |}{b}}) } $ Where − ${mu}$ = location parameter. ${b}$ = scale parameter and is > 0. ${x}$ = random variable. Print Page Previous Next Advertisements ”;

One Proportion Z Test

Statistics – One Proportion Z Test ”; Previous Next The test statistic is a z-score (z) defined by the following equation. ${z = frac{(p – P)}{sigma}}$ where P is the hypothesized value of population proportion in the null hypothesis, p is the sample proportion, and ${sigma}$ is the standard deviation of the sampling distribution. Test Statistics is defined and given by the following function: Formula ${ z = frac {hat p -p_o}{sqrt{frac{p_o(1-p_o)}{n}}} }$ Where − ${z}$ = Test statistics ${n}$ = Sample size ${p_o}$ = Null hypothesized value ${hat p}$ = Observed proportion Example Problem Statement: A survey claims that 9 out of 10 doctors recommend aspirin for their patients with headaches. To test this claim, a random sample of 100 doctors is obtained. Of these 100 doctors, 82 indicate that they recommend aspirin. Is this claim accurate? Use alpha = 0.05. Solution: Define Null and Alternative Hypotheses ${ H_0;p = .90 \[7pt] H_0;p ne .90 }$ Here Alpha = 0.05. Using an alpha of 0.05 with a two-tailed test, we would expect our distribution to look something like this: Here we have 0.025 in each tail. Looking up 1 – 0.025 in our z-table, we find a critical value of 1.96. Thus, our decision rule for this two-tailed test is: If Z is less than -1.96, or greater than 1.96, reject the null hypothesis.Calculate Test Statistic: ${ z = frac {hat p -p_o}{sqrt{frac{p_o(1-p_o)}{n}}} \[7pt] hat p = .82 \[7pt] p_o = .90 \[7pt] n = 100 \[7pt] z_o = frac {.82 – .90}{sqrt{frac{ .90 (1- .90)}{100}}} \[7pt] = frac{-.08}{0.03} \[7pt] = -2.667 }$ As z = -2.667 Thus as result we should reject the null hypothesis and as conclusion, The claim that 9 out of 10 doctors recommend aspirin for their patients is not accurate, z = -2.667, p < 0.05. Print Page Previous Next Advertisements ”;

Inverse Gamma Distribution

Statistics – Inverse Gamma Distribution ”; Previous Next Inverse Gamma Distribution is a reciprocal of gamma probability density function with positive shape parameters $ {alpha, beta } $ and location parameter $ { mu } $. $ {alpha } $ controls the height. Higher the $ {alpha } $, taller is the probability density function (PDF). $ {beta } $ controls the speed. It is defined by following formula. Formula ${ f(x) = frac{x^{-(alpha+1)}e^{frac{-1}{beta x}}}{ Gamma(alpha) beta^alpha} \[7pt] , where x gt 0 }$ Where − ${alpha}$ = positive shape parameter. ${beta}$ = positive shape parameter. ${x}$ = random variable. Following diagram shows the probability density function with different parameter combinations. Print Page Previous Next Advertisements ”;

Relative Standard Deviation

Statistics – Relative Standard Deviation ”; Previous Next In probability theory and statistics, the coefficient of variation (CV), also known as relative standard deviation (RSD), is a standardized measure of dispersion of a probability distribution or frequency distribution. Relative Standard Deviation, RSD is defined and given by the following probability function: Formula ${100 times frac{s}{bar x}}$ Where − ${s}$ = the sample standard deviation ${bar x}$ = sample mean Example Problem Statement: Find the RSD for the following set of numbers: 49, 51.3, 52.7, 55.8 and the standard deviation are 2.8437065. Solution: Step 1 – Standard deviation of sample: 2.8437065 (or 2.84 rounded to 2 decimal places). Step 2 – Multiply Step 1 by 100. Set this number aside for a moment. ${2.84 times 100 = 284}$ Step 3 – Find the sample mean, ${bar x}$. The sample mean is: ${frac{(49 + 51.3 + 52.7 + 55.8)}{4} = frac{208.8}{4} = 52.2.}$ Step 4Divide Step 2 by the absolute value of Step 3. ${frac{284}{|52.2|} = 5.44.}$ The RSD is: ${52.2 pm 5.4}$% Note that the RSD is expressed as a percentage. Print Page Previous Next Advertisements ”;