Statistics – Gamma Distribution ”; Previous Next The gamma distribution represents continuous probability distributions of two-parameter family. Gamma distributions are devised with generally three kind of parameter combinations. A shape parameter $ k $ and a scale parameter $ theta $. A shape parameter $ alpha = k $ and an inverse scale parameter $ beta = frac{1}{ theta} $, called as rate parameter. A shape parameter $ k $ and a mean parameter $ mu = frac{k}{beta} $. Each parameter is a positive real numbers. The gamma distribution is the maximum entropy probability distribution driven by following criteria. Formula ${E[X] = k theta = frac{alpha}{beta} gt 0 and is fixed. \[7pt] E[ln(X)] = psi (k) + ln( theta) = psi( alpha) – ln( beta) and is fixed. }$ Where − ${X}$ = Random variable. ${psi}$ = digamma function. Characterization using shape $ alpha $ and rate $ beta $ Probability density function Probability density function of Gamma distribution is given as: Formula ${ f(x; alpha, beta) = frac{beta^alpha x^{alpha – 1 } e^{-x beta}}{Gamma(alpha)} where x ge 0 and alpha, beta gt 0 }$ Where − ${alpha}$ = location parameter. ${beta}$ = scale parameter. ${x}$ = random variable. Cumulative distribution function Cumulative distribution function of Gamma distribution is given as: Formula ${ F(x; alpha, beta) = int_0^x f(u; alpha, beta) du = frac{gamma(alpha, beta x)}{Gamma(alpha)}}$ Where − ${alpha}$ = location parameter. ${beta}$ = scale parameter. ${x}$ = random variable. ${gamma(alpha, beta x)} $ = lower incomplete gamma function. Characterization using shape $ k $ and scale $ theta $ Probability density function Probability density function of Gamma distribution is given as: Formula ${ f(x; k, theta) = frac{x^{k – 1 } e^{-frac{x}{theta}}}{theta^k Gamma(k)} where x gt 0 and k, theta gt 0 }$ Where − ${k}$ = shape parameter. ${theta}$ = scale parameter. ${x}$ = random variable. ${Gamma(k)}$ = gamma function evaluated at k. Cumulative distribution function Cumulative distribution function of Gamma distribution is given as: Formula ${ F(x; k, theta) = int_0^x f(u; k, theta) du = frac{gamma(k, frac{x}{theta})}{Gamma(k)}}$ Where − ${k}$ = shape parameter. ${theta}$ = scale parameter. ${x}$ = random variable. ${gamma(k, frac{x}{theta})} $ = lower incomplete gamma function. Print Page Previous Next Advertisements ”;
Category: statistics
Statistics – Discrete Series Arithmetic Median ”; Previous Next When data is given along with their frequencies. Following is an example of discrete series − Items 5 10 20 30 40 50 60 70 Frequency 2 5 1 3 12 0 5 7 In case of a group having even number of distribution, Arithmetic Median is found out by taking out the Arithmetic Mean of two middle values after arranging the numbers in ascending order. Formula Median = Value of ($frac{N+1}{2})^{th} item$. Where − ${N}$ = Number of observations Example Problem Statement − Let”s calculate Arithmetic Median for the following discrete data − Items, ${X}$ 14 36 45 70 105 145 Frequency, ${f}$ 2 5 2 3 12 4 Comulative Frequency, ${C_f}$ 2 7 9 12 24 28 Terms 1-2 3-7 8-9 10-12 13-24 25-28 Solution − Based on the above mentioned formula, Arithmetic Median M will be − $M = Value of (frac{N+1}{2})^{th} item. \[7pt] , = Value of (frac{28+1}{2})^{th} item. \[7pt] , = Value of 14.5^{th} item. \[7pt] , = Value of (frac{14^{th} item + 15^{th} item}{2})\[7pt] , = (frac{105 + 105}{2}) , = {105}$ The Arithmetic Median of the given numbers is 2. In case of a group having even number of distribution, Arithmetic Median is the middle number after arranging the numbers in ascending order. Example Let”s calculate Arithmetic Median for the following discrete data − Items, ${X}$ 14 36 45 70 105 Frequency, ${f}$ 2 5 1 4 13 Comulative Frequency, ${C_f}$ 2 7 8 12 25 Terms 1-2 3-7 8-8 9-12 13-25 Given numbers are 25, an odd number thus middle number, 12th term is the Arithmetic Median. ∴ The Arithmetic Median of the given numbers is 70. Calculator Print Page Previous Next Advertisements ”;
Probability Additive Theorem
Statistics – Probability Additive Theorem ”; Previous Next For Mutually Exclusive Events The additive theorem of probability states if A and B are two mutually exclusive events then the probability of either A or B is given by ${P(A or B) = P(A) + P(B) \[7pt] P (A cup B) = P(A) + P(B)}$ The theorem can he extended to three mutually exclusive events also as ${P(A cup B cup C) = P(A) + P(B) + P(C) }$ Example Problem Statement: A card is drawn from a pack of 52, what is the probability that it is a king or a queen? Solution: Let Event (A) = Draw of a card of king Event (B) Draw of a card of queen P (card draw is king or queen) = P (card is king) + P (card is queen) ${P (A cup B) = P(A) + P(B) \[7pt] = frac{4}{52} + frac{4}{52} \[7pt] = frac{1}{13} + frac{1}{13} \[7pt] = frac{2}{13}}$ For Non-Mutually Exclusive Events In case there is a possibility of both events to occur then the additive theorem is written as: ${P(A or B) = P(A) + P(B) – P(A and B)\[7pt] P (A cup B) = P(A) + P(B) – P(AB)}$ Example Problem Statement: A shooter is known to hit a target 3 out of 7 shots; whet another shooter is known to hit the target 2 out of 5 shots. Find the probability of the target being hit at all when both of them try. Solution: Probability of first shooter hitting the target P (A) = ${frac{3}{7}}$ Probability of second shooter hitting the target P (B) = ${frac{2}{5}}$ Event A and B are not mutually exclusive as both the shooters may hit target. Hence the additive rule applicable is ${P (A cup B) = P (A) + P(B) – P (A cap B) \[7pt] = frac{3}{7}+frac{2}{5}-(frac{3}{7} times frac{2}{5}) \[7pt] = frac{29}{35}-frac{6}{35} \[7pt] = frac{23}{35}}$ Print Page Previous Next Advertisements ”;
Geometric Mean
Statistics – Geometric Mean ”; Previous Next Geometric mean of n numbers is defined as the nth root of the product of n numbers. Formula ${GM = sqrt[n]{x_1 times x_2 times x_3 … x_n}}$ Where − ${n}$ = Total numbers. ${x_i}$ = numbers. Example Problem Statement: Determine the geometric mean of following set of numbers. 1 3 9 27 81 Solution: Step 1: Here n = 5 $ {GM = sqrt[n]{x_1 times x_2 times x_3 … x_n} \[7pt] , = sqrt[5]{1 times 3 times 9 times 27 times 81} \[7pt] , = sqrt[5]{3^3 times 3^3 times 3^4} \[7pt] , = sqrt[5]{3^{10}} \[7pt] , = sqrt[5]{{3^2}^5} \[7pt] , = sqrt[5]{9^5} \[7pt] , = 9 }$ Thus geometric mean of given numbers is $ 9 $. Print Page Previous Next Advertisements ”;
Discrete Series Arithmetic Mean ”; Previous Next When data is given along with their frequencies. Following is an example of discrete series − Items 5 10 20 30 40 50 60 70 Frequency 2 5 1 3 12 0 5 7 For discrete series, the Arithmetic Mean can be calculated using the following formula. Formula $bar{x} = frac{f_1x_1 + f_2x_2 + f_3x_3……..+ f_nx_n}{N}$ Alternatively, we can write same formula as follows − $bar{x} = frac{sum fx}{sum f}$ Where − ${N}$ = Number of observations ${f_1,f_2,f_3,…,f_n}$ = Different values of frequency f. ${x_1,x_2,x_3,…,x_n}$ = Different values of variable x. Example Problem Statement − Calculate Arithmetic Mean for the following discrete data − Items 14 36 45 70 Frequency 2 5 1 3 Solution − Based on the given data, we have − Items Frequencyf ${fx}$ 14 2 28 36 5 180 45 1 45 70 3 210 ${N=11}$ ${sum fx=463}$ Based on the above mentioned formula, Arithmetic Mean $bar{x}$ will be − $bar{x} = frac{463}{11} \[7pt] , = {42.09}$ The Arithmetic Mean of the given numbers is 42.09. Calculator Print Page Previous Next Advertisements ”;
Interval Estimation
Statistics – Interval Estimation ”; Previous Next Interval estimation is the use of sample data to calculate an interval of possible (or probable) values of an unknown population parameter, in contrast to point estimation, which is a single number. Formula ${mu = bar x pm Z_{frac{alpha}{2}}frac{sigma}{sqrt n}}$ Where − ${bar x}$ = mean ${Z_{frac{alpha}{2}}}$ = the confidence coefficient ${alpha}$ = confidence level ${sigma}$ = standard deviation ${n}$ = sample size Example Problem Statement: Suppose a student measuring the boiling temperature of a certain liquid observes the readings (in degrees Celsius) 102.5, 101.7, 103.1, 100.9, 100.5, and 102.2 on 6 different samples of the liquid. He calculates the sample mean to be 101.82. If he knows that the standard deviation for this procedure is 1.2 degrees, what is the interval estimation for the population mean at a 95% confidence level? Solution: The student calculated the sample mean of the boiling temperatures to be 101.82, with standard deviation ${sigma = 0.49}$. The critical value for a 95% confidence interval is 1.96, where ${frac{1-0.95}{2} = 0.025}$. A 95% confidence interval for the unknown mean. ${ = ((101.82 – (1.96 times 0.49)), (101.82 + (1.96 times 0.49))) \[7pt] = (101.82 – 0.96, 101.82 + 0.96) \[7pt] = (100.86, 102.78) }$ As the level of confidence decreases, the size of the corresponding interval will decrease. Suppose the student was interested in a 90% confidence interval for the boiling temperature. In this case, ${sigma = 0.90}$, and ${frac{1-0.90}{2} = 0.05}$. The critical value for this level is equal to 1.645, so the 90% confidence interval is ${ = ((101.82 – (1.645 times 0.49)), (101.82 + (1.645 times 0.49))) \[7pt] = (101.82 – 0.81, 101.82 + 0.81) \[7pt] = (101.01, 102.63)}$ An increase in sample size will decrease the length of the confidence interval without reducing the level of confidence. This is because the standard deviation decreases as n increases. Margin of Error The margin of error ${m}$ of interval estimation is defined to be the value added or subtracted from the sample mean which determines the length of the interval: ${Z_{frac{alpha}{2}}frac{sigma}{sqrt n}}$ Suppose in the example above, the student wishes to have a margin of error equal to 0.5 with 95% confidence. Substituting the appropriate values into the expression for ${m}$ and solving for n gives the calculation. ${ n = {(1.96 times frac{1.2}{0.5})}^2 \[7pt] = {frac{2.35}{0.5}^2} \[7pt] = {(4.7)}^2 = 22.09 }$ To achieve 95% interval estimation for the mean boiling point with total length less than 1 degree, the student will have to take 23 measurements. Print Page Previous Next Advertisements ”;
Outlier Function
Statistics – Outlier Function ”; Previous Next An outlier in a probability distribution function is a number that is more than 1.5 times the length of the data set away from either the lower or upper quartiles. Specifically, if a number is less than ${Q_1 – 1.5 times IQR}$ or greater than ${Q_3 + 1.5 times IQR}$, then it is an outlier. Outlier is defined and given by the following probability function: Formula ${Outlier datas are, lt Q_1 – 1.5 times IQR (or) gt Q_3 + 1.5 times IQR }$ Where − ${Q_1}$ = First Quartile ${Q_2}$ = Third Quartile ${IQR}$ = Inter Quartile Range Example Problem Statement: Consider a data set that represents the 8 different students periodic task count. The task count information set is, 11, 13, 15, 3, 16, 25, 12 and 14. Discover the outlier data from the students periodic task counts. Solution: Given data set is: 11 13 15 3 16 25 12 14 Arrange it in ascending order: 3 11 12 13 14 15 16 25 First Quartile Value() ${Q_1}$ ${ Q_1 = frac{(11 + 12)}{2} \[7pt] = 11.5 }$ Third Quartile Value() ${Q_3}$ ${ Q_3 = frac{(15 + 16)}{2} \[7pt] = 15.5 }$ Lower Outlier Range (L) ${ Q_1 – 1.5 times IQR \[7pt] = 11.5 – (1.5 times 4) \[7pt] = 11.5 – 6 \[7pt] = 5.5 }$ Upper Outlier Range (L) ${ Q_3 + 1.5 times IQR \[7pt] = 15.5 + (1.5 times 4) \[7pt] = 15.5 + 6 \[7pt] = 21.5 }$ In the given information, 5.5 and 21.5 is more greater than the other values in the given data set i.e. except from 3 and 25 since 3 is greater than 5.5 and 25 is lesser than 21.5. In this way, we utilize 3 and 25 as the outlier values. Print Page Previous Next Advertisements ”;
Statistics – Continuous Series Arithmetic Median ”; Previous Next When data is given based on ranges along with their frequencies. Following is an example of continous series − Items 0-5 5-10 10-20 20-30 30-40 Frequency 2 5 1 3 12 Formula $Median = {L} + frac{(frac{n}{2} – c.f.)}{f} times {i}$ Where − ${L}$ = Lower limit of median class, median class is that class where $frac{n}{2}^{th}$ item is lying. ${c.f.}$ = Cumulative frequency of the class preceding the median class. ${f}$ = Frequency of median class. ${i}$ = Class interval of median class. Arithmetic Median is a useful measure of central tendency in case the data type is nominal data. Since it is a positional average, it does not get affected by extreme values. Example Problem Statement − In a study conducted in an organization, the distribution of income across the workers is observed. Find the the median wage of the workers of the organization. 06 men get less than Rs. 500 13 men get less than Rs. 1000 22 men get less than Rs. 1500 30 men get less than Rs. 2000 34 men get less than Rs. 2500 40 men get less than Rs. 3000 Solution − Given are the cumulative frequencies of the workers. Hence we first find the simple frequency and present the data in tabular form. Income (rs.) M.P. m Frequency f (m-1250)/500 d fd c.f 0 – 500 250 6 -2 -12 6 500 – 1000 750 7 -1 -7 13 1000 – 1500 1250 9 0 0 22 1500 – 2000 1750 8 1 8 30 2000 – 2500 2250 4 2 8 34 2500 – 3000 2750 6 3 18 40 N = 40 ∑ fd = 15 In order to simplify the calculation, a common factor i = 500 has been taken. Using the following formula for calculating median wage. $Median = {L} + frac{(frac{n}{2} – c.f.)}{f} times {i}$ Where − ${L}$ = 1000 $frac{n}{2}$ = 20 ${c.f.}$ = 13 ${f}$ = 9 ${i}$ = 500 Thus $Median = {1000} + frac{(20 – 13)}{9} times {500} \[7pt] , = {1000 + 388.9} \[7pt] , = {1388.9}$ As 1388.9 ≃ 1389. The median wage is Rs. 1389. Calculator Print Page Previous Next Advertisements ”;
Hypergeometric Distribution
Statistics – Hypergeometric Distribution ”; Previous Next A hypergeometric random variable is the number of successes that result from a hypergeometric experiment. The probability distribution of a hypergeometric random variable is called a hypergeometric distribution. Hypergeometric distribution is defined and given by the following probability function: Formula ${h(x;N,n,K) = frac{[C(k,x)][C(N-k,n-x)]}{C(N,n)}}$ Where − ${N}$ = items in the population ${k}$ = successes in the population. ${n}$ = items in the random sample drawn from that population. ${x}$ = successes in the random sample. Example Problem Statement: Suppose we randomly select 5 cards without replacement from an ordinary deck of playing cards. What is the probability of getting exactly 2 red cards (i.e., hearts or diamonds)? Solution: This is a hypergeometric experiment in which we know the following: N = 52; since there are 52 cards in a deck. k = 26; since there are 26 red cards in a deck. n = 5; since we randomly select 5 cards from the deck. x = 2; since 2 of the cards we select are red. We plug these values into the hypergeometric formula as follows: ${h(x;N,n,k) = frac{[C(k,x)][C(N-k,n-x)]}{C(N,n)} \[7pt] h(2; 52, 5, 26) = frac{[C(26,2)][C(52-26,5-2)]}{C(52,5)} \[7pt] = frac{[325][2600]}{2598960} \[7pt] = 0.32513 }$ Thus, the probability of randomly selecting 2 red cards is 0.32513. Print Page Previous Next Advertisements ”;
Cumulative plots
Statistics – Cumulative plots ”; Previous Next A cumulative plot is a way to draw cumulative information graphically. It displays the number / percentages, or proportion of observations that are less than or equal to particular value. Example Problem Statement: Draw the frequency and comulative frequency plots of 10 student test scores based on following data. Sr. No. Roll No. Test Score 1 100 30 2 101 40 3 102 35 4 103 50 5 104 60 6 105 65 7 105 35 8 105 55 9 105 65 10 105 70 Solution: For the frequency chart, compute the frequecies as shown below. This table show the no. of students scoring in given ranges. Sr. No. Frequency Students 1 30-40 3 2 40-50 1 3 50-60 2 4 60-70 3 4 70-80 1 Following is the required frequency plot For the comulative frequency chart, compute the frequecies as shown below. This table show the no. of students scoring upto given marks(including). Sr. No. Upto Score Students 1 30 1 2 40 3 3 50 4 4 60 7 5 70 10 Following is the required frequency plot Print Page Previous Next Advertisements ”;