Statistics – Discrete Series Arithmetic Median ”; Previous Next When data is given along with their frequencies. Following is an example of discrete series − Items 5 10 20 30 40 50 60 70 Frequency 2 5 1 3 12 0 5 7 In case of a group having even number of distribution, Arithmetic Median is found out by taking out the Arithmetic Mean of two middle values after arranging the numbers in ascending order. Formula Median = Value of ($frac{N+1}{2})^{th} item$. Where − ${N}$ = Number of observations Example Problem Statement − Let”s calculate Arithmetic Median for the following discrete data − Items, ${X}$ 14 36 45 70 105 145 Frequency, ${f}$ 2 5 2 3 12 4 Comulative Frequency, ${C_f}$ 2 7 9 12 24 28 Terms 1-2 3-7 8-9 10-12 13-24 25-28 Solution − Based on the above mentioned formula, Arithmetic Median M will be − $M = Value of (frac{N+1}{2})^{th} item. \[7pt] , = Value of (frac{28+1}{2})^{th} item. \[7pt] , = Value of 14.5^{th} item. \[7pt] , = Value of (frac{14^{th} item + 15^{th} item}{2})\[7pt] , = (frac{105 + 105}{2}) , = {105}$ The Arithmetic Median of the given numbers is 2. In case of a group having even number of distribution, Arithmetic Median is the middle number after arranging the numbers in ascending order. Example Let”s calculate Arithmetic Median for the following discrete data − Items, ${X}$ 14 36 45 70 105 Frequency, ${f}$ 2 5 1 4 13 Comulative Frequency, ${C_f}$ 2 7 8 12 25 Terms 1-2 3-7 8-8 9-12 13-25 Given numbers are 25, an odd number thus middle number, 12th term is the Arithmetic Median. ∴ The Arithmetic Median of the given numbers is 70. Calculator Print Page Previous Next Advertisements ”;
Category: Big Data & Analytics
Probability Additive Theorem
Statistics – Probability Additive Theorem ”; Previous Next For Mutually Exclusive Events The additive theorem of probability states if A and B are two mutually exclusive events then the probability of either A or B is given by ${P(A or B) = P(A) + P(B) \[7pt] P (A cup B) = P(A) + P(B)}$ The theorem can he extended to three mutually exclusive events also as ${P(A cup B cup C) = P(A) + P(B) + P(C) }$ Example Problem Statement: A card is drawn from a pack of 52, what is the probability that it is a king or a queen? Solution: Let Event (A) = Draw of a card of king Event (B) Draw of a card of queen P (card draw is king or queen) = P (card is king) + P (card is queen) ${P (A cup B) = P(A) + P(B) \[7pt] = frac{4}{52} + frac{4}{52} \[7pt] = frac{1}{13} + frac{1}{13} \[7pt] = frac{2}{13}}$ For Non-Mutually Exclusive Events In case there is a possibility of both events to occur then the additive theorem is written as: ${P(A or B) = P(A) + P(B) – P(A and B)\[7pt] P (A cup B) = P(A) + P(B) – P(AB)}$ Example Problem Statement: A shooter is known to hit a target 3 out of 7 shots; whet another shooter is known to hit the target 2 out of 5 shots. Find the probability of the target being hit at all when both of them try. Solution: Probability of first shooter hitting the target P (A) = ${frac{3}{7}}$ Probability of second shooter hitting the target P (B) = ${frac{2}{5}}$ Event A and B are not mutually exclusive as both the shooters may hit target. Hence the additive rule applicable is ${P (A cup B) = P (A) + P(B) – P (A cap B) \[7pt] = frac{3}{7}+frac{2}{5}-(frac{3}{7} times frac{2}{5}) \[7pt] = frac{29}{35}-frac{6}{35} \[7pt] = frac{23}{35}}$ Print Page Previous Next Advertisements ”;
Geometric Mean
Statistics – Geometric Mean ”; Previous Next Geometric mean of n numbers is defined as the nth root of the product of n numbers. Formula ${GM = sqrt[n]{x_1 times x_2 times x_3 … x_n}}$ Where − ${n}$ = Total numbers. ${x_i}$ = numbers. Example Problem Statement: Determine the geometric mean of following set of numbers. 1 3 9 27 81 Solution: Step 1: Here n = 5 $ {GM = sqrt[n]{x_1 times x_2 times x_3 … x_n} \[7pt] , = sqrt[5]{1 times 3 times 9 times 27 times 81} \[7pt] , = sqrt[5]{3^3 times 3^3 times 3^4} \[7pt] , = sqrt[5]{3^{10}} \[7pt] , = sqrt[5]{{3^2}^5} \[7pt] , = sqrt[5]{9^5} \[7pt] , = 9 }$ Thus geometric mean of given numbers is $ 9 $. Print Page Previous Next Advertisements ”;
Rayleigh Distribution
Statistics – Rayleigh Distribution ”; Previous Next The Rayleigh distribution is a distribution of continuous probability density function. It is named after the English Lord Rayleigh. This distribution is widely used for the following: Communications – to model multiple paths of densely scattered signals while reaching a receiver. Physical Sciences – to model wind speed, wave heights, sound or light radiation. Engineering – to check the lifetime of an object depending upon its age. Medical Imaging – to model noise variance in magnetic resonance imaging. The probability density function Rayleigh distribution is defined as: Formula ${ f(x; sigma) = frac{x}{sigma^2} e^{frac{-x^2}{2sigma^2}}, x ge 0 }$ Where − ${sigma}$ = scale parameter of the distribution. The comulative distribution function Rayleigh distribution is defined as: Formula ${ F(x; sigma) = 1 – e^{frac{-x^2}{2sigma^2}}, x in [0 infty}$ Where − ${sigma}$ = scale parameter of the distribution. Variance and Expected Value The expected value or the mean of a Rayleigh distribution is given by: ${ E[x] = sigma sqrt{frac{pi}{2}} }$ The variance of a Rayleigh distribution is given by: ${ Var[x] = sigma^2 frac{4-pi}{2} }$ Print Page Previous Next Advertisements ”;
Exponential distribution
Statistics – Exponential distribution ”; Previous Next Exponential distribution or negative exponential distribution represents a probability distribution to describe the time between events in a Poisson process. In Poisson process events occur continuously and independently at a constant average rate. Exponential distribution is a particular case of the gamma distribution. Probability density function Probability density function of Exponential distribution is given as: Formula ${ f(x; lambda ) = } $ $ begin {cases} lambda e^{-lambda x}, & text{if $x ge 0 $} \[7pt] 0, & text{if $x lt 0 $} end{cases} $ Where − ${lambda}$ = rate parameter. ${x}$ = random variable. Cumulative distribution function Cumulative distribution function of Exponential distribution is given as: Formula ${ F(x; lambda) = }$ $ begin {cases} 1- e^{-lambda x}, & text{if $x ge 0 $} \[7pt] 0, & text{if $x lt 0 $} end{cases} $ Where − ${lambda}$ = rate parameter. ${x}$ = random variable. Print Page Previous Next Advertisements ”;
Gumbel Distribution
Statistics – Gumbel Distribution ”; Previous Next Gumbel Distribution represents the distribution of extreme values either maximum or minimum of samples used in various distributions. It is used to model distribution of peak levels. For example, to show the distribution of peak temperatures of the year if there is a list of maximum temperatures of 10 years. Probability density function Probability density function of Gumbel distribution is given as: Formula ${ P(x) = frac{1}{beta} e^{[frac{x – alpha}{beta} – e^{frac{x – alpha}{beta}}]} }$ Where − ${alpha}$ = location parameter. ${beta}$ = scale parameter. ${x}$ = random variable. Cumulative distribution function Cumulative distribution function of Gumbel distribution is given as: Formula ${ D(x) = 1 – e^{-e^{frac{x – alpha}{beta}}}}$ Where − ${alpha}$ = location parameter. ${beta}$ = scale parameter. ${x}$ = random variable. Print Page Previous Next Advertisements ”;
F distribution
Statistics – F distribution ”; Previous Next The F distribution (Snedecor”s F distribution or the FisherSnedecor distribution) represents continuous probability distribution which occurs frequently as null distribution of test statistics. It happens mostly during analysis of variance or F-test. Probability density function Probability density function of F distribution is given as: Formula ${ f(x; d_1, d_2) = frac{sqrt{frac{(d_1 x)^{d_1} d_2^{d_2}}{(d_1x+d_2)^{d_1+d_2}}}}{x beta (frac{d_1}{2}, frac{d_2}{2})} }$ Where − ${d_1}$ = positive parameter. ${d_2}$ = positive parameter. ${x}$ = random variable. Cumulative distribution function Cumulative distribution function of F distribution is given as: Formula ${ F(x; d_1, d_2) = I_{frac{d_1x}{d_1x+d_2}}(frac{d_1}{2}, frac{d_2}{2})}$ Where − ${d_1}$ = positive parameter. ${d_2}$ = positive parameter. ${x}$ = random variable. ${I} $ = lower incomplete beta function. Print Page Previous Next Advertisements ”;
Goodness of Fit
Statistics – Goodness of Fit ”; Previous Next The Goodness of Fit test is used to check the sample data whether it fits from a distribution of a population. Population may have normal distribution or Weibull distribution. In simple words, it signifies that sample data represents the data correctly that we are expecting to find from actual population. Following tests are generally used by statisticians: Chi-square Kolmogorov-Smirnov Anderson-Darling Shipiro-Wilk Chi-square Test The chi-square test is the most commonly used to test the goodness of fit tests and is used for discrete distributions like the binomial distribution and the Poisson distribution, whereas The Kolmogorov-Smirnov and Anderson-Darling goodness of fit tests are used for continuous distributions. Formula ${ X^2 = sum {[ frac{(O_i – E_i)^2}{E_i}]} }$ Where − ${O_i}$ = observed value of i th level of variable. ${E_i}$ = expected value of i th level of variable. ${X^2}$ = chi-squared random variable. Example A toy company builts football player toys. It claims that 30% of the cards are mid-fielders, 60% defenders, and 10% are forwards. Considering a random sample of 100 toys has 50 mid-fielders, 45 defenders, and 5 forwards. Given 0.05 level of significance, can you justify company”s claim? Solution: Determine Hypotheses Null hypothesis $ H_0 $ – The proportion of mid-fielders, defenders, and forwards is 30%, 60% and 10%, respectively. Alternative hypothesis $ H_1 $ – At least one of the proportions in the null hypothesis is false. Determine Degree of Freedom The degrees of freedom, DF is equal to the number of levels (k) of the categorical variable minus 1: DF = k – 1. Here levels are 3. Thus ${ DF = k – 1 \[7pt] , = 3 -1 = 2 }$ Determine chi-square test statistic ${ X^2 = sum {[ frac{(O_i – E_i)^2}{E_i}]} \[7pt] , = [frac{(50-30)^2}{30}] + [frac{(45-60)^2}{60}] + [frac{(5-10)^2}{10}] \[7pt] , = frac{400}{30} + frac{225}{60} + frac{25}{10} \[7pt] , = 13.33 + 3.75 + 2.50 \[7pt] , = 19.58 }$ Determine p-value P-value is the probability that a chi-square statistic,$ X^2 $ having 2 degrees of freedom is more extreme than 19.58. Use the Chi-Square Distribution Calculator to find $ { P(X^2 gt 19.58) = 0.0001 } $. Interpret results As the P-value (0.0001) is quite less than the significance level (0.05), the null hypothesis can not be accepted. Thus company claim is invalid. Print Page Previous Next Advertisements ”;
Frequency Distribution
Statistics – Frequency Distribution ”; Previous Next Frequency distribution is a table that displays the frequency of various outcomes in a sample. Each entry in the table contains the frequency or count of the occurrences of values within a particular group or interval, and in this way, the table summarizes the distribution of values in the sample. Example Problem Statement: Constructing a frequency distribution table of a survey was taken on Maple Avenue. In each of 20 homes, people were asked how many cars were registered to their households. The results were recorded as follows: 1 2 1 0 3 4 0 1 1 1 2 2 3 2 3 2 1 4 0 0 Solution: Steps to be followed for present this data in a frequency distribution table. Divide the results (x) into intervals, and then count the number of results in each interval. In this case, the intervals would be the number of households with no car (0), one car (1), two cars (2) and so forth. Make a table with separate columns for the interval numbers (the number of cars per household), the tallied results, and the frequency of results in each interval. Label these columns Number of cars, Tally and Frequency. Read the list of data from left to right and place a tally mark in the appropriate row. For example, the first result is a 1, so place a tally mark in the row beside where 1 appears in the interval column (Number of cars). The next result is a 2, so place a tally mark in the row beside the 2, and so on. When you reach your fifth tally mark, draw a tally line through the preceding four marks to make your final frequency calculations easier to read. Add up the number of tally marks in each row and record them in the final column entitled Frequency. Your frequency distribution table for this exercise should look like this: Frequency table for the number of cars registered in each household Number of cars (x) Tally Frequency (f) 0 ${lvertlvertlvertlvert}$ 4 1 ${require{cancel} cancel{lvertlvertlvertlvert} lvert}$ 6 2 ${cancel{lvertlvertlvertlvert}}$ 5 3 ${lvertlvertlvert}$ 3 4 ${lvertlvert}$ 3 By looking at this frequency distribution table quickly, we can see that out of 20 households surveyed, 4 households had no cars, 6 households had 1 car. Print Page Previous Next Advertisements ”;
Quartile Deviation
Statistics – Quartile Deviation ”; Previous Next It depends on the lower quartile ${Q_1}$ and the upper quartile ${Q_3}$. The difference ${Q_3 – Q_1}$ is called the inter quartile range. The difference ${Q_3 – Q_1}$ divided by 2 is called semi-inter quartile range or the quartile deviation. Formula ${Q.D. = frac{Q_3 – Q_1}{2}}$ Coefficient of Quartile Deviation A relative measure of dispersion based on the quartile deviation is known as the coefficient of quartile deviation. It is characterized as ${Coefficient of Quartile Deviation = frac{Q_3 – Q_1}{Q_3 + Q_1}}$ Example Problem Statement: Calculate the quartile deviation and coefficient of quartile deviation from the data given below: Maximum Load(short-tons) Number of Cables 9.3-9.7 22 9.8-10.2 55 10.3-10.7 12 10.8-11.2 17 11.3-11.7 14 11.8-12.2 66 12.3-12.7 33 12.8-13.2 11 Solution: Maximum Load(short-tons) Number of Cables(f) ClassBounderies CumulativeFrequencies 9.3-9.7 2 9.25-9.75 2 9.8-10.2 5 9.75-10.25 2 + 5 = 7 10.3-10.7 12 10.25-10.75 7 + 12 = 19 10.8-11.2 17 10.75-11.25 19 + 17 = 36 11.3-11.7 14 11.25-11.75 36 + 14 = 50 11.8-12.2 6 11.75-12.25 50 + 6 = 56 12.3-12.7 3 12.25-12.75 56 + 3 = 59 12.8-13.2 1 12.75-13.25 59 + 1 = 60 ${Q_1}$ Value of ${frac{n}{4}^{th}}$ item =Value of ${frac{60}{4}^{th}}$ thing = ${15^{th}}$ item. Thus ${Q_1}$ lies in class 10.25-10.75. $ {Q_1 = 1+ frac{h}{f}(frac{n}{4} – c) \[7pt] ,Where l=10.25, h=0.5, f=12, frac{n}{4}=15 and c=7 , \[7pt] , = 10.25+frac{0.5}{12} (15-7) , \[7pt] , = 10.25+0.33 , \[7pt] , = 10.58 }$ ${Q_3}$ Value of ${frac{3n}{4}^{th}}$ item =Value of ${frac{3 times 60}{4}^{th}}$ thing = ${45^{th}}$ item. Thus ${Q_3}$ lies in class 11.25-11.75. $ {Q_3 = 1+ frac{h}{f}(frac{3n}{4} – c) \[7pt] ,Where l=11.25, h=0.5, f=14, frac{3n}{4}=45 and c=36 , \[7pt] , = 11.25+frac{0.5}{14} (45-36) , \[7pt] , = 11.25+0.32 , \[7pt] , = 11.57 }$ Quartile Deviation $ {Q.D. = frac{Q_3 – Q_1}{2} \[7pt] , = frac{11.57 – 10.58}{2} , \[7pt] , = frac{0.99}{2} , \[7pt] , = 0.495 }$ Coefficient of Quartile Deviation ${Coefficient of Quartile Deviation = frac{Q_3 – Q_1}{Q_3 + Q_1} \[7pt] , = frac{11.57 – 10.58}{11.57 + 10.58} , \[7pt] , = frac{0.99}{22.15} , \[7pt] , = 0.045 }$ Print Page Previous Next Advertisements ”;