statistics Archives - Page 5 of 12 - Donotsad where can learn any thing work project and make money

Aug 10

Range Rule of Thumb

Statistics – Range Rule of Thumb ”; Previous Next The Range Rule of Thumb says that the range is about four times the standard deviation. The standard deviation is another measure of spread in statistics. It tells you how your data is clustered around the mean. Formula ${s approx frac{R}{4}}$ Where − ${s}$ = standard deviation. ${R}$ = Maximum – Minimum of a range. How the range rule works, we will look at the following example. Example Problem Statement: Given the following values: 12, 12, 14, 15, 16, 18, 18, 20, 20, and 25. Calculate standard deviation using range rule of thumb. Solution: These values have mean of 17. we first calculate the range of our data as 25 – 12 = 13, and then divide this number by four we have our estimate of the standard deviation as ${frac{13}{4} = 3.25}$. This number is relatively close to the true standard deviation, and good for a rough estimate. Print Page Previous Next Advertisements ”;

Aug 10

Probability Density Function

Statistics – Probability Density Function ”; Previous Next In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function that describes the relative likelihood for this random variable to take on a given value. Probability density function is defined by following formula: ${P(a le X le b) = int_a^b f(x) d_x}$ Where − ${[a,b]}$ = Interval in which x lies. ${P(a le X le b)}$ = probability that some value x lies within this interval. ${d_x}$ = b-a Example Problem Statement: During the day, a clock at random stops once at any time. If x be the time when it stops and the PDF for x is given by: ${f(x) = begin{cases} 1/24, & text{for $ 0 le x le 240 $} \ 0, & text{otherwise} end{cases} }$ Calculate the probability that clock stops between 2 pm and 2:45 pm. Solution: We have found the value of the following: ${P(14 le X le 14.45) = int_{14}^{14.45} f(x) d_x \[7pt] = frac{1}{24} (14.45 – 14) \[7pt] = frac{1}{24}(0.45) \[7pt] = 0.01875 }$ Print Page Previous Next Advertisements ”;

Aug 10

Mean Deviation

Statistics – Mean Deviation ”; Previous Next Referred to as average deviation, it is defined as the sum of the deviations(ignoring signs) from an average divided by the number of items in a distribution The average can be mean, median or mode. Theoretically median is d best average of choice because sum of deviations from median is minimum, provided signs are ignored. However, practically speaking, arithmetic mean is the most commonly used average for calculating mean deviation and is denoted by the symbol ${MD}$. We”re going to discuss methods to compute the Mean Deviation for three types of series: Individual Data Series Discrete Data Series Continuous Data Series Individual Data Series When data is given on individual basis. Following is an example of individual series: Items 5 10 20 30 40 50 60 70 Discrete Data Series When data is given alongwith their frequencies. Following is an example of discrete series: Items 5 10 20 30 40 50 60 70 Frequency 2 5 1 3 12 0 5 7 Continuous Data Series When data is given based on ranges alongwith their frequencies. Following is an example of continous series: Items 0-5 5-10 10-20 20-30 30-40 Frequency 2 5 1 3 12 Print Page Previous Next Advertisements ”;

Aug 10

Quadratic Regression Equation

Statistics – Quadratic Regression Equation ”; Previous Next Quadratic regression is deployed to figure out an equation of the parabola which can best fit the given set of data. It is of following form: ${ y = ax^2 + bx + c where a ne 0}$ Least square method can be used to find out the Quadratic Regression Equation. In this method, we find out the value of a, b and c so that squared vertical distance between each given point (${x_i, y_i}$) and the parabola equation (${ y = ax^2 + bx + c}$) is minimal. The matrix equation for the parabolic curve is given by: $ {begin{bmatrix} sum {x_i}^4 & sum {x_i}^3 & sum {x_i}^2 \ sum {x_i}^3 & sum {x_i}^2 & sum x_i \ sum {x_i}^2 & sum x_i & n end{bmatrix} begin{bmatrix} a \ b \ c end{bmatrix} = begin{bmatrix} sum {x_i}^2{y_i} \ sum x_iy_i \ sum y_i end{bmatrix} }$ Correlation Coefficient, r Correlation coefficient, r determines how good a quardratic equation can fit the given data. If r is close to 1 then it is good fit. r can be computed by following formula. ${ r = 1 – frac{SSE}{SST} where \[7pt] SSE = sum (y_i – a{x_i}^2 – bx_i – c)^2 \[7pt] SST = sum (y_i – bar y)^2 }$ Generally, quadratic regression calculators are used to compute the quadratic regression equation. Example Problem Statement: Compute the quadratic regression equation of following data. Check its best fitness. x -3 -2 -1 0 1 2 3 y 7.5 3 0.5 1 3 6 14 Solution: Compute a quadratic regression on calculator by putting the x and y values. The best fit quadratic equation for above points comes as ${ y = 1.1071x^2 + x + 0.5714 }$ To check the best fitness, plot the graph. So the value of Correlation Coefficient, r for the data is 0.99420 and is close to 1. Hence quadratic regression equation is best fit. Print Page Previous Next Advertisements ”;

Aug 10

Normal Distribution

Statistics – Normal Distribution ”; Previous Next A normal distribution is an arrangement of a data set in which most values cluster in the middle of the range and the rest taper off symmetrically toward either extreme. Height is one simple example of something that follows a normal distribution pattern: Most people are of average height the numbers of people that are taller and shorter than average are fairly equal and a very small (and still roughly equivalent) number of people are either extremely tall or extremely short.Here”s an example of a normal distribution curve: A graphical representation of a normal distribution is sometimes called a bell curve because of its flared shape. The precise shape can vary according to the distribution of the population but the peak is always in the middle and the curve is always symmetrical. In a normal distribution the mean mode and median are all the same. Formula ${y = frac{1}{sqrt {2 pi}}e^{frac{-(x – mu)^2}{2 sigma}} }$ Where − ${mu}$ = Mean ${sigma}$ = Standard Deviation ${pi approx 3.14159}$ ${e approx 2.71828}$ Example Problem Statement: A survey of daily travel time had these results (in minutes): 26 33 65 28 34 55 25 44 50 36 26 37 43 62 35 38 45 32 28 34 The Mean is 38.8 minutes, and the Standard Deviation is 11.4 minutes. Convert the values to z – scores and prepare the Normal Distribution Graph. Solution: The formula for z-score that we have been using: ${z = frac{x – mu}{sigma} }$ Where − ${z}$ = the “z-score” (Standard Score) ${x}$ = the value to be standardized ${mu}$ = mean ${sigma}$ = the standard deviation To convert 26: First subtract the mean: 26-38.8 = -12.8, Then divide by the Standard Deviation: -12.8/11.4 = -1.12 So 26 is -1.12 Standard Deviation from the Mean Here are the first three conversions. Original Value Calculation Standard Score (z-score) 26 (26-38.8) / 11.4 = -1.12 33 (33-38.8) / 11.4 = -0.51 65 (65-38.8) / 11.4 = -2.30 … … … And here they graphically represent: Print Page Previous Next Advertisements ”; Tutorials Point is a leading Ed Tech company striving to provide the best learning material on technical and non-technical subjects. About us Company Our Team Careers Jobs Contact Us Terms of use Privacy Policy Refund Policy Cookies Policy FAQ”s Tutorials Point India Private Limited, Incor9 Building, Kavuri Hills, Madhapur, Hyderabad, Telangana – 500081, INDIA Tutorials Articles Jobs Courses Certifications Annual Membership Languages Python Tutorial Java Tutorial C++ Tutorial C Programming Tutorial C# Tutorial PHP Tutorial R Tutorial Go Tutorial Web Technologies HTML Tutorial CSS Tutorial JavaScript Tutorial ReactJS Tutorial Bootstrap Tutorial AngularJS Tutorial Node.js Tutorial TypeScript Tutorial Database SQL Tutorial MySQL Tutorial DBMS Tutorial MongoDB Tutorial SQLite Tutorial PL/SQL Tutorial PostgreSQL Tutorial Excel Tutorial Editors Online SQL Editor Online Html Editor Online Css Editor Online Javascript Editor Online Latext Editor Online TEX Editor Online Mathml Compiler Online Markdown Editor Trending Technologies Cloud Computing Tutorial Amazon Web Services Tutorial Microsoft Azure Tutorial Git Tutorial Ethical Hacking Tutorial Docker Tutorial Kubernetes Tutorial Compilers Online Java Compiler Online C Compiler Online C++ Compiler Online C# Compiler Online Php Compiler Online Matlab Compiler Online Bash Compiler Terminals Online Unix Terminal Online Python3 Terminal Online Php Terminal Online Nodejs Terminal Online R Terminal Online Numpy Terminal Online Octave Terminal Data Science & ML NLP Tutorial NumPy Tutorial Python Pandas Tutorial Machine Learning Tutorial Big Data Analytics Tutorial Cryptography Tutorial Power BI Tutorial Computer Science DSA Tutorial Spring Boot Tutorial SDLC Tutorial Unix Tutorial Operating System Tutorial Assembly Programming Tutorial Digital Circuits Tutorial Microprocessor Tutorial System Analysis and Design Tutorial Flutter Tutorial Top Certifications Business Analytics Certification Java & Spring Boot Advanced Certification Data Science Advanced Certification Advanced Certification In Cloud Computing And DevOps Advanced Certification In Business Analytics Artificial Intelligence And Machine Learning Certification DevOps Certification Game Development Certification Front-End Developer Certification AWS Certification Training Python Programming Certification Generative AI Certification Microsoft Excel Certification Training Java Certification Cyber Security Certification Coding For Beginners Certification JavaScript Certification Apache Spark Certification Advanced Python Certification Back-End Developer Certification Front-End Developer Certification Web Developer Advanced Certification Linux System Administrator Certification Automation Testing Certification Training © Copyright 2024. All Rights Reserved.

Aug 10

Log Gamma Distribution

Statistics – Log Gamma Distribution ”; Previous Next Log Gamma Distribution is a probability density function with positive shape parameters $ {alpha, beta } $ and location parameter $ { mu } $. It is defined by following formula. Formula ${ f(x) = frac{e^{beta x}e^{frac{-e^x}{alpha}}}{ alpha^beta Gamma(beta)} \[7pt] , where -infty gt x lt infty }$ Where − ${alpha}$ = positive shape parameter. ${beta}$ = positive shape parameter. ${x}$ = random variable. Following diagram shows the probability density function with three different parameter combinations. Print Page Previous Next Advertisements ”;

Aug 10

Probability Multiplecative Theorem

Statistics – Probability Multiplicative Theorem ”; Previous Next For Independent Events The theorem states that the probability of the simultaneous occurrence of two events that are independent is given by the product of their individual probabilities. ${P(A and B) = P(A) times P(B) \[7pt] P (AB) = P(A) times P(B)}$ The theorem can he extended to three or more independent events also as ${P(A cap B cap C) = P(A) times P(B) times P(C) P (A,B and C) = P(A) times P(B) times P(C) }$ Example Problem Statement: A college has to appoint a lecturer who must be B.Com., MBA, and Ph. D, the probability of which is ${frac{1}{20}}$, ${frac{1}{25}}$, and ${frac{1}{40}}$ respectively. Find the probability of getting such a person to be appointed by the college. Solution: Probability of a person being a B.Com.P(A) =${frac{1}{20}}$ Probability of a person being a MBA P(B) = ${frac{1}{25}}$ Probability of a person being a Ph.D P(C) =${frac{1}{40}}$ Using multiplicative theorem for independent events ${ P (A,B and C) = P(A) times P(B) times P(C) \[7pt] = frac{1}{20} times frac{1}{25} times frac{1}{40} \[7pt] = .05 times .04 times .025 \[7pt] = .00005 }$ For Dependent Events (Conditional Probability) As defined earlier, dependent events are those were the occurrences or nonoccurrence of one event effects the outcome of next event. For such events the earlier stated multiplicative theorem is not applicable. The probability associated with such events is called as conditional probability and is given by P(A/B) = ${frac{P(AB)}{P(B)}}$ or ${frac{P(A cap B)}{P(B)}}$ Read P(A/B) as the probability of occurrence of event A when event B has already occurred. Similarly the conditional probability of B given A is P(B/A) = ${frac{P(AB)}{P(A)}}$ or ${frac{P(A cap B)}{P(A)}}$ Example Problem Statement: A coin is tossed 2 times. The toss resulted in one head and one tail. What is the probability that the first throw resulted in a tail? Solution: The sample space of a coin tossed two times is given as S = {HH, HT, TH, TT} Let Event A be the first throw resulting in a tail. Event B be that one tail and one head occurred. ${ P(A) = frac{P(TH,TT)}{P(HH,HT,TH,TT)} = frac{2}{4} =frac {1}{2} \[7pt] P(A cap B) = frac{P(TH)}{P(HH,HT,TH,TT)} =frac{1}{4} \[7pt] So P (A/B) = frac{P(A cap B)}{P(A)} \[7pt] = frac{frac{1}{4}}{frac{1}{2}} \[7pt] = frac{1}{2} = 0.5 }$ Print Page Previous Next Advertisements ”;

Aug 10

Qualitative Data Vs Quantitative Data

Statistics – Qualitative Data Vs Quantitative Data ”; Previous Next Qualitative Data Qualitative data is a set of information which can not be measured using numbers. It generally consist of words, subjective narratives. Result of an qualitative data analysis can come in form of highlighting key words, extracting information and concepts elaboration. For example, a study on parents perception about the current education system for their kids. The resulted information collected from them might be in narrative form and you need to deduce the analysis that they are satisfied, un-satisfied or need improvement in certain areas and so on. Strengh Better understanding – Qualitative data gives a better understanding of the perspectives and needs of participants. Provides Explaination – Qualitative data along with quantitative data can explain the result of the survey and can measure the correction of the quantitative data. Better Identification of behavior patterns – Qualitative data can provide detailed information which can prove itself useful in identification of behaviorial patterns. Weakness Lesser reachability – Being subjective in nature, small population is generally covered to represent the large population. Time Consuming – Qualitative data is time consuming as large data is to be understood. Possiblity of Bias – Being subjective analysis; evaluator bias is quite feasible. Quantitative Data Quantitative data is a set of numbers collected from a group of people and involves statistical analysis.For example if you conduct a satisfaction survey from participants and ask them to rate their experience on a scale of 1 to 5. You can collect the ratings and being numerical in nature, you will use statistical techniques to draw conclusions about participants satisfaction. Strengh Specific Quantitative data is clear and specific to the survey conducted. High ReliabilityIf collected properly, quantitative data is normally accurate and hence highly reliable. Easy communicationQuantitative data is easy to communicate and elaborate using charts, graphs etc. Existing supportMany large datasets may be already present that can be analyzed to check the relevance of the survey. Weakness Limited Options – Respondents are required to choose from limited options. High Complexity – Qualitative data may need complex procedures to get correct sample. Require Expertise – Analysis of qualitative data requires certain expertise in statistical analysis. Print Page Previous Next Advertisements ”;

Aug 10

Process Capability (Cp) & Process Performance (Pp)

Statistics – Process Capability (Cp) & Process Performance (Pp) ”; Previous Next Process Capability Process capability can be defined as a measurable property of a process relative to its specification. It is expressed as a process capability index ${C_p}$. The process capability index is used to check the variability of the output generated by the process and to compare the variablity with the product tolerance. ${C_p}$ is governed by following formula: Formula ${ C_p = min[frac{USL – mu}{3 times sigma}, frac{mu – LSL}{3 times sigma}] }$ Where − ${USL}$ = Upper Specification Limit. ${LSL}$ = Lower Specification Limit. ${mu}$ = estimated mean of the process. ${sigma}$ = estimated variability of the process, standard deviation. Higher the value of process capability index ${C_p}$, better is the process. Example Consider the case of a car and its parking garage. garage size states the specification limits and car defines the process output. Here process capability will tell the relatonship between car size, garage size and how far from middle of the garage you can parked the car. If car size is litter smaller than garage size then you can easily fit your car into it. If car size is very small compared to garage size then it can fit from any distance from center. In term of process of control, such process with little variation, allows to park car easily in garage and meets the customer”s requirement. Let”s see the above stated example in terms of process capability index ${C_p}$. ${C_p = frac{1}{2}}$ – garage size is smaller than car and can not accomodate your car. ${C_p = 1}$ – garage size is just sufficient for car and can accomodate your car only. ${C_p = 2}$ – garage size is two times than your car and can accomodate two cars at a time. ${C_p = 3}$ – garage size is three times than your car and can accomodate three cars at a time. Process Performance Process performance works to check the conformance of the sample generated using the process. It is expressed as a process performance index ${P_p}$. It checks whether it is meeting customer requirement or not. It varies from Process Capability in the fact that Process Performance is applicable to a particular batch of material. Sampling method may need to be quite substancial to support of the variation in the batch. Process Performance is only to be used when a process control cannot be evaluated. ${P_p}$ is governed by following formula: Formula ${ P_p = frac{USL – LSL}{6 times sigma} }$ Where − ${USL}$ = Upper Specification Limit. ${LSL}$ = Lower Specification Limit. ${sigma}$ = estimated variability of the process, standard deviation. Higher the value of process performance index ${P_p}$, better is the process. Print Page Previous Next Advertisements ”;

Aug 10

Kurtosis

Statistics – Kurtosis ”; Previous Next The degree of tailedness of a distribution is measured by kurtosis. It tells us the extent to which the distribution is more or less outlier-prone (heavier or light-tailed) than the normal distribution. Three different types of curves, courtesy of Investopedia, are shown as follows − It is difficult to discern different types of kurtosis from the density plots (left panel) because the tails are close to zero for all distributions. But differences in the tails are easy to see in the normal quantile-quantile plots (right panel). The normal curve is called Mesokurtic curve. If the curve of a distribution is more outlier prone (or heavier-tailed) than a normal or mesokurtic curve then it is referred to as a Leptokurtic curve. If a curve is less outlier prone (or lighter-tailed) than a normal curve, it is called as a platykurtic curve. Kurtosis is measured by moments and is given by the following formula − Formula ${beta_2 = frac{mu_4}{mu_2}}$ Where − ${mu_4 = frac{sum(x- bar x)^4}{N}}$ The greater the value of beta_2 the more peaked or leptokurtic the curve. A normal curve has a value of 3, a leptokurtic has beta_2 greater than 3 and platykurtic has beta_2 less then 3. Example Problem Statement: The data on daily wages of 45 workers of a factory are given. Compute beta_1 and beta_2 using moment about the mean. Comment on the results. Wages(Rs.) Number of Workers 100-200 1 120-200 2 140-200 6 160-200 20 180-200 11 200-200 3 220-200 2 Solution: Wages(Rs.) Number of Workers(f) Mid-ptm m-${frac{170}{20}}$ d ${fd}$ ${fd^2}$ ${fd^3}$ ${fd^4}$ 100-200 1 110 -3 -3 9 -27 81 120-200 2 130 -2 -4 8 -16 32 140-200 6 150 -1 -6 6 -6 6 160-200 20 170 0 0 0 0 0 180-200 11 190 1 11 11 11 11 200-200 3 210 2 6 12 24 48 220-200 2 230 3 6 18 54 162 ${N=45}$ ${sum fd = 10}$ ${sum fd^2 = 64}$ ${sum fd^3 = 40}$ ${sum fd^4 = 330}$ Since the deviations have been taken from an assumed mean, hence we first calculate moments about arbitrary origin and then moments about mean. Moments about arbitrary origin ”170” ${mu_1^1= frac{sum fd}{N} times i = frac{10}{45} times 20 = 4.44 \[7pt] mu_2^1= frac{sum fd^2}{N} times i^2 = frac{64}{45} times 20^2 =568.88 \[7pt] mu_3^1= frac{sum fd^2}{N} times i^3 = frac{40}{45} times 20^3 =7111.11 \[7pt] mu_4^1= frac{sum fd^4}{N} times i^4 = frac{330}{45} times 20^4 =1173333.33 }$ Moments about mean ${mu_2 = mu”_2 – (mu”_1 )^2 = 568.88-(4.44)^2 = 549.16 \[7pt] mu_3 = mu”_3 – 3(mu”_1)(mu”_2) + 2(mu”_1)^3 \[7pt] , = 7111.11 – (4.44) (568.88)+ 2(4.44)^3 \[7pt] , = 7111.11 – 7577.48+175.05 = – 291.32 \[7pt] \[7pt] mu_4= mu”_4 – 4(mu”_1)(mu”_3) + 6 (mu_1 )^2 (mu”_2) -3(mu”_1)^4 \[7pt] , = 1173333.33 – 4 (4.44)(7111.11)+6(4.44)^2 (568.88) – 3(4.44)^4 \[7pt] , = 1173333.33 – 126293.31+67288.03-1165.87 \[7pt] , = 1113162.18 }$ From the value of movement about mean, we can now calculate ${beta_1}$ and ${beta_2}$: ${beta_1 = mu^2_3 = frac{(-291.32)^2}{(549.16)^3} = 0.00051 \[7pt] beta_2 = frac{mu_4}{(mu_2)^2} = frac{1113162.18}{(546.16)^2} = 3.69 }$ From the above calculations, it can be concluded that ${beta_1}$, which measures skewness is almost zero, thereby indicating that the distribution is almost symmetrical. ${beta_2}$ Which measures kurtosis, has a value greater than 3, thus implying that the distribution is leptokurtic. Print Page Previous Next Advertisements ”;