Statistics – Cumulative plots ”; Previous Next A cumulative plot is a way to draw cumulative information graphically. It displays the number / percentages, or proportion of observations that are less than or equal to particular value. Example Problem Statement: Draw the frequency and comulative frequency plots of 10 student test scores based on following data. Sr. No. Roll No. Test Score 1 100 30 2 101 40 3 102 35 4 103 50 5 104 60 6 105 65 7 105 35 8 105 55 9 105 65 10 105 70 Solution: For the frequency chart, compute the frequecies as shown below. This table show the no. of students scoring in given ranges. Sr. No. Frequency Students 1 30-40 3 2 40-50 1 3 50-60 2 4 60-70 3 4 70-80 1 Following is the required frequency plot For the comulative frequency chart, compute the frequecies as shown below. This table show the no. of students scoring upto given marks(including). Sr. No. Upto Score Students 1 30 1 2 40 3 3 50 4 4 60 7 5 70 10 Following is the required frequency plot Print Page Previous Next Advertisements ”;
Category: Big Data & Analytics
Statistics – Cumulative Poisson Distribution ”; Previous Next ${lambda}$ is the shape parameter which indicates the average number of events in the given time interval. The following is the plot of the Poisson probability density function for four values of ${lambda}$. Cumulative Distribution Function. Formula $${F(x,lambda) = sum_{k=0}^x frac{e^{- lambda} lambda ^x}{k!}}$$ Where − ${e}$ = The base of the natural logarithm equal to 2.71828 ${k}$ = The number of occurrences of an event; the probability of which is given by the function. ${k!}$ = The factorial of k ${lambda}$ = A positive real number, equal to the expected number of occurrences during the given interval Example Problem Statement: A complex software system averages 7 errors per 5,000 lines of code. What is the probability of exactly 2 errors in 5,000 lines of randomly selected lines of code? Solution: The probability of exactly 2 errors in 5,000 lines of randomly selected lines of code is: ${ p(2,7) = frac{e^{-7} 7^2}{2!} = 0.022}$ Print Page Previous Next Advertisements ”;
Statistics – Continuous Uniform Distribution ”; Previous Next The continuous uniform distribution is the probability distribution of random number selection from the continuous interval between a and b. Its density function is defined by the following. Here is a graph of the continuous uniform distribution with a = 1, b = 3. Formula f(x) = begin{cases} 1/(b-a), & text{when $ a le x le b $} \ 0, & text{when $x lt a$ or $x gt b$} end{cases} Example Problem Statement: Suppose you are leading a test and present an inquiry on the crowd of 20 contenders. The time permitted to answer the inquiry is 30 seconds. What number of persons is prone to react inside of 5 seconds? (Regularly, the contenders are required to click a catch of the right decision and the champ is picked on the premise of first snap). Solution: Step 1: The interval of the probability distribution in seconds is [0, 30]. ⇒ The probability density is = 1/30-0=1/30. Step 2: The requirement is how many will respond in 5 seconds. That is, the sub interval of the successful event is [0, 5]. Now the probability P (x < 5) is the proportion of the widths of these two interval. ⇒ 5/30=1/6. Subsequent to there are 20 contenders, the quantity of contenders prone to react in 5 seconds is (1/6) (20) =3. Print Page Previous Next Advertisements ”;
Cluster sampling
Statistics – Cluster sampling ”; Previous Next In cluster sampling, groups of elements that ideally speaking, are heterogeneous in nature within group, and are chosen randomly. Unlike stratified sampling where groups are homogeneous and few elements are randomly chosen from each group, in cluster sampling the group with intra group heterogeneity are developed and all the elements within the group become a pan of the sample. Whereas stratified sampling has intra group homogeneity and inter group heterogeneity, cluster sampling has intra group heterogeneity. Examples One stage cluster sampling A committee comprising of number of members from different departments has a high degree of heterogeneity. When from number of such committees, few are chosen randomly, and then it is a case of one stage cluster sampling. Two stage cluster sampling If from each cluster which has been randomly chosen, few elements are chosen randomly using simple random sampling or any other probability method then it is a two stage cluster sampling. Multi-stage cluster sampling A cluster sample can be a multiple stage sampling, when the choice of element in a sample involves selection at multiple stages e.g. if in a national survey on insurance products a sample of insurance companies is to be drawn, then it requires developing clusters at multiple stages. In the first stage the clusters are formed on the basis of public and private companies. At the next stage a group of companies is chosen randomly from each cluster developed earlier. In the third stage the office location of each chosen company from where data is to be collected is chosen randomly. Thus in multistage sampling, probability sampling of primary units is done, then from each primary unit a sample of secondary sampling units is drawn and then the third levels till we reach the final stage of breakdown for the sample units. Print Page Previous Next Advertisements ”;
Statistics – Data collection – Case Study Method ”; Previous Next Case study research is a qualitative research method that is used to examine contemporary real-life situations and apply the findings of the case to the problem under study. Case studies involve a detailed contextual analysis of a limited number of events or conditions and their relationships. It provides the basis for the application of ideas and extension of methods. It helps a researcher to understand a complex issue or object and add strength to what is already known through previous research. STEPS OF CASE STUDY METHOD In order to ensure objectivity and clarity, a researcher should adopt a methodical approach to case studies research. The following steps can be followed: Identify and define the research questions – The researcher starts with establishing the focus of the study by identifying the research object and the problem surrounding it. The research object would be a person, a program, an event or an entity. Select the cases – In this step the researcher decides on the number of cases to choose (single or multiple), the type of cases to choose (unique or typical) and the approach to collect, store and analyze the data. This is the design phase of the case study method. Collect the data – The researcher now collects the data with the objective of gathering multiple sources of evidence with reference to the problem under study. This evidence is stored comprehensively and systematically in a format that can be referenced and sorted easily so that converging lines of inquiry and patterns can be uncovered. Evaluate and analyze the data – In this step the researcher makes use of varied methods to analyze qualitative as well as quantitative data. The data is categorized, tabulated and cross checked to address the initial propositions or purpose of the study. Graphic techniques like placing information into arrays, creating matrices of categories, creating flow charts etc. are used to help the investigators to approach the data from different ways and thus avoid making premature conclusions. Multiple investigators may also be used to examine the data so that a wide variety of insights to the available data can be developed. Presentation of Results – The results are presented in a manner that allows the reader to evaluate the findings in the light of the evidence presented in the report. The results are corroborated with sufficient evidence showing that all aspects of the problem have been adequately explored. The newer insights gained and the conflicting propositions that have emerged are suitably highlighted in the report. Print Page Previous Next Advertisements ”;
Data Patterns
Statistics – Data Patterns ”; Previous Next Data patterns are very useful when they are drawn graphically. Data patterns commonly described in terms of features like center, spread, shape, and other unusual properties. Other special descriptive labels are symmetric, bell-shaped, skewed, etc. Center The center of a distribution, graphically, is located at the median of the distribution. Such a graphic chart displays that almost half of the observations are on either side. Height of each column indicates the frequency of observations. Spread The spread of a distribution refers to the variation of the data. If the set of observation covers a wide range, the spread is larger. If the observations are centered around a single value, then the spread is smaller. Shape The shape of a distribution can described using following characteristics. Symmetry – In symmetric distribution, graph can be divided at the center in such a way that each half is a mirror image of the other. Number of peaks. – Distributions with one or multiple peaks. Distribution with one clear peak is known as unimodal, and distribution with two clear peaks is called bimodal. A single peak symmetric distribution at the center, is referred to as bell-shaped. Skewness – Some distributions may have multiple observations on one side of the graph than the other side. Distributions having fewer observations towards lower values are said to be skewed right; and distributions with fewer observations towards lower values are said to be skewed left. Uniform – When the set of observations has no peak and have data equally spread across the range of the distribution, then the distribution is called a uniform distribution. Unusual Features Common unusual features of data patterns are gaps and outliers. Gaps – Gaps points to areas of a distribution having no observations. Following figure has a gap as there are no observations in the middle of the distribution. Outliers – Distributions may be characterized by extreme values that differ greatly from the other set of observation data. These extreme values are refered as outliers. Following figure illustrates a distribution with an outlier. Print Page Previous Next Advertisements ”;
Permutation
Statistics – Permutation ”; Previous Next A permutation is an arrangement of all or part of a set of objects, with regard to the order of the arrangement. For example, suppose we have a set of three letters: A, B, and C. we might ask how many ways we can arrange 2 letters from that set. Permutation is defined and given by the following function: Formula ${^nP_r = frac{n!}{(n-r)!} }$ Where − ${n}$ = of the set from which elements are permuted. ${r}$ = size of each permutation. ${n,r}$ are non negative integers. Example Problem Statement: A computer scientist is trying to discover the keyword for a financial account. If the keyword consists only of 10 lower case characters (e.g., 10 characters from among the set: a, b, c… w, x, y, z) and no character can be repeated, how many different unique arrangements of characters exist? Solution: Step 1: Determine whether the question pertains to permutations or combinations. Since changing the order of the potential keywords (e.g., ajk vs. kja) would create a new possibility, this is a permutations problem. Step 2: Determine n and r n = 26 since the computer scientist is choosing from 26 possibilities (e.g., a, b, c… x, y, z). r = 10 since the computer scientist is choosing 10 characters. Step 2: Apply the formula ${^{26}P_{10} = frac{26!}{(26-10)!} \[7pt] = frac{26!}{16!} \[7pt] = frac{26(25)(24)…(11)(10)(9)…(1)}{(16)(15)…(1)} \[7pt] = 26(25)(24)…(17) \[7pt] = 19275223968000 }$ Print Page Previous Next Advertisements ”;
Combination
Statistics – Combination ”; Previous Next A combination is a selection of all or part of a set of objects, without regard to the order in which objects are selected. For example, suppose we have a set of three letters: A, B, and C. we might ask how many ways we can select 2 letters from that set. Combination is defined and given by the following function − Formula ${C(n,r) = frac{n!}{r!(n-r)!}}$ Where − ${n}$ = the number of objects to choose from. ${r}$ = the number of objects selected. Example Problem Statement − How many different groups of 10 students can a teacher select from her classroom of 15 students? Solution − Step 1 − Determine whether the question pertains to permutations or combinations. Since changing the order of the selected students would not create a new group, this is a combinations problem. Step 2 − Determine n and r n = 15 since the teacher is choosing from 15 students. r = 10 since the teacher is selecting 10 students. Step 3 − Apply the formula ${^{15}C_{10} = frac{15!}{(15-10)!10!} \[7pt] = frac{15!}{5!10!} \[7pt] = frac{15(14)(13)(12)(11)(10!)}{5!10!} \[7pt] = frac{15(14)(13)(12)(11)}{5!} \[7pt] = frac{15(14)(13)(12)(11)}{5(4)(3)(2)(1)} \[7pt] = frac{(14)(13)(3)(11)}{(2)(1)} \[7pt] = (7)(13)(3)(11) \[7pt] = 3003}$ Calculator Print Page Previous Next Advertisements ”;
Discrete Series Arithmetic Mean ”; Previous Next When data is given along with their frequencies. Following is an example of discrete series − Items 5 10 20 30 40 50 60 70 Frequency 2 5 1 3 12 0 5 7 For discrete series, the Arithmetic Mean can be calculated using the following formula. Formula $bar{x} = frac{f_1x_1 + f_2x_2 + f_3x_3……..+ f_nx_n}{N}$ Alternatively, we can write same formula as follows − $bar{x} = frac{sum fx}{sum f}$ Where − ${N}$ = Number of observations ${f_1,f_2,f_3,…,f_n}$ = Different values of frequency f. ${x_1,x_2,x_3,…,x_n}$ = Different values of variable x. Example Problem Statement − Calculate Arithmetic Mean for the following discrete data − Items 14 36 45 70 Frequency 2 5 1 3 Solution − Based on the given data, we have − Items Frequencyf ${fx}$ 14 2 28 36 5 180 45 1 45 70 3 210 ${N=11}$ ${sum fx=463}$ Based on the above mentioned formula, Arithmetic Mean $bar{x}$ will be − $bar{x} = frac{463}{11} \[7pt] , = {42.09}$ The Arithmetic Mean of the given numbers is 42.09. Calculator Print Page Previous Next Advertisements ”;
Interval Estimation
Statistics – Interval Estimation ”; Previous Next Interval estimation is the use of sample data to calculate an interval of possible (or probable) values of an unknown population parameter, in contrast to point estimation, which is a single number. Formula ${mu = bar x pm Z_{frac{alpha}{2}}frac{sigma}{sqrt n}}$ Where − ${bar x}$ = mean ${Z_{frac{alpha}{2}}}$ = the confidence coefficient ${alpha}$ = confidence level ${sigma}$ = standard deviation ${n}$ = sample size Example Problem Statement: Suppose a student measuring the boiling temperature of a certain liquid observes the readings (in degrees Celsius) 102.5, 101.7, 103.1, 100.9, 100.5, and 102.2 on 6 different samples of the liquid. He calculates the sample mean to be 101.82. If he knows that the standard deviation for this procedure is 1.2 degrees, what is the interval estimation for the population mean at a 95% confidence level? Solution: The student calculated the sample mean of the boiling temperatures to be 101.82, with standard deviation ${sigma = 0.49}$. The critical value for a 95% confidence interval is 1.96, where ${frac{1-0.95}{2} = 0.025}$. A 95% confidence interval for the unknown mean. ${ = ((101.82 – (1.96 times 0.49)), (101.82 + (1.96 times 0.49))) \[7pt] = (101.82 – 0.96, 101.82 + 0.96) \[7pt] = (100.86, 102.78) }$ As the level of confidence decreases, the size of the corresponding interval will decrease. Suppose the student was interested in a 90% confidence interval for the boiling temperature. In this case, ${sigma = 0.90}$, and ${frac{1-0.90}{2} = 0.05}$. The critical value for this level is equal to 1.645, so the 90% confidence interval is ${ = ((101.82 – (1.645 times 0.49)), (101.82 + (1.645 times 0.49))) \[7pt] = (101.82 – 0.81, 101.82 + 0.81) \[7pt] = (101.01, 102.63)}$ An increase in sample size will decrease the length of the confidence interval without reducing the level of confidence. This is because the standard deviation decreases as n increases. Margin of Error The margin of error ${m}$ of interval estimation is defined to be the value added or subtracted from the sample mean which determines the length of the interval: ${Z_{frac{alpha}{2}}frac{sigma}{sqrt n}}$ Suppose in the example above, the student wishes to have a margin of error equal to 0.5 with 95% confidence. Substituting the appropriate values into the expression for ${m}$ and solving for n gives the calculation. ${ n = {(1.96 times frac{1.2}{0.5})}^2 \[7pt] = {frac{2.35}{0.5}^2} \[7pt] = {(4.7)}^2 = 22.09 }$ To achieve 95% interval estimation for the mean boiling point with total length less than 1 degree, the student will have to take 23 measurements. Print Page Previous Next Advertisements ”;