Simple random sampling

Statistics – Simple random sampling ”; Previous Next A simple random sample is defined as one in which each element of the population has an equal and independent chance of being selected. In case of a population with N units, the probability of choosing n sample units, with all possible combinations of NCn samples is given by 1/NCn e.g. If we have a population of five elements (A, B, C, D, E) i.e. N 5, and we want a sample of size n = 3, then there are 5C3 = 10 possible samples and the probability of any single unit being a member of the sample is given by 1/10. Simple random sampling can be done in two different ways i.e. ”with replacement” or ”without replacement”. When the units are selected into a sample successively after replacing the selected unit before the next draw, it is a simple random sample with replacement. If the units selected are not replaced before the next draw and drawing of successive units are made only from the remaining units of the population, then it is termed as simple random sample without replacement. Thus in the former method a unit once selected may be repeated, whereas in the latter a unit once selected is not repeated. Due to more statistical efficiency associated with a simple random sample without replacement it is the preferred method. A simple random sample can be drawn through either of the two procedures i.e. through lottery method or through random number tables. Lottery Method – Under this method units are selected on the basis of random draws. Firstly each member or element of the population is assigned a unique number. In the next step these numbers are written on separate cards which are physically similar in shape, size, color etc. Then they are placed in a basket and thoroughly mixed. In the last step the slips are taken out randomly without looking at them. The number of slips drawn is equal to the sample size required. Lottery method suffers from few drawbacks. The process of writing N number of slips is cumbersome and shuffling a large number of slips, where population size is very large, is difficult. Also human bias may enter while choosing the slips. Hence the other alternative i.e. random numbers can be used. Random Number Tables Method – These consist of columns of numbers which have been randomly prepared. Number of random tables are available e.g. Fisher and Yates Tables, Tippets random number etc. Listed below is a sequence of two digited random numbers from Fisher & Yates table: 61, 44, 65, 22, 01, 67, 76, 23, 57, 58, 54, 11, 33, 86, 07, 26, 75, 76, 64, 22, 19, 35, 74, 49, 86, 58, 69, 52, 27, 34, 91, 25, 34, 67, 76, 73, 27, 16, 53, 18, 19, 69, 32, 52, 38, 72, 38, 64, 81, 79 and 38. The first step involves assigning a unique number to each member of the population e.g. if the population comprises of 20 people then all individuals are numbered from 01 to 20. If we are to collect a sample of 5 units then referring to the random number tables 5 double digit numbers are chosen. E.g. using the above table the units having the following five numbers will form a sample: 01, 11, 07, 19 and 16. If the sampling is without replacement and a particular random number repeats itself then it will not be taken again and the next number that fits our criteria will be chosen. Thus a simple random sample can be drawn using either of the two procedures. However in practice, it has been seen that simple random sample involves lots of time and effort and is impractical. Print Page Previous Next Advertisements ”;

Standard Error ( SE )

Statistics – Standard Error ( SE ) ”; Previous Next The standard deviation of a sampling distribution is called as standard error. In sampling, the three most important characteristics are: accuracy, bias and precision. It can be said that: The estimate derived from any one sample is accurate to the extent that it differs from the population parameter. Since the population parameters can only be determined by a sample survey, hence they are generally unknown and the actual difference between the sample estimate and population parameter cannot be measured. The estimator is unbiased if the mean of the estimates derived from all the possible samples equals the population parameter. Even if the estimator is unbiased an individual sample is most likely going to yield inaccurate estimate and as stated earlier, inaccuracy cannot be measured. However it is possible to measure the precision i.e. the range between which the true value of the population parameter is expected to lie, using the concept of standard error. Formula $SE_bar{x} = frac{s}{sqrt{n}}$ Where − ${s}$ = Standard Deviation and ${n}$ = No.of observations Example Problem Statement: Calculate Standard Error for the following individual data: Items 14 36 45 70 105 Solution: Let”s first compute the Arithmetic Mean $bar{x}$ $bar{x} = frac{14 + 36 + 45 + 70 + 105}{5} \[7pt] , = frac{270}{5} \[7pt] , = {54}$ Let”s now compute the Standard Deviation ${s}$ $s = sqrt{frac{1}{n-1}((x_{1}-bar{x})^{2}+(x_{2}-bar{x})^{2}+…+(x_{n}-bar{x})^{2})} \[7pt] , = sqrt{frac{1}{5-1}((14-54)^{2}+(36-54)^{2}+(45-54)^{2}+(70-54)^{2}+(105-54)^{2})} \[7pt] , = sqrt{frac{1}{4}(1600+324+81+256+2601)} \[7pt] , = {34.86}$ Thus the Standard Error $SE_bar{x}$ $SE_bar{x} = frac{s}{sqrt{n}} \[7pt] , = frac{34.86}{sqrt{5}} \[7pt] , = frac{34.86}{2.23} \[7pt] , = {15.63}$ The Standard Error of the given numbers is 15.63. The smaller the proportion of the population that is sampled the less is the effect of this multiplier because then the finite multiplier will be close to one and will affect the standard error negligibly. Hence if the sample size is less than 5% of population, the finite multiplier is ignored. Print Page Previous Next Advertisements ”;

Probability Bayes Theorem

Statistics – Probability Bayes Theorem ”; Previous Next One of the most significant developments in the probability field has been the development of Bayesian decision theory which has proved to be of immense help in making decisions under uncertain conditions. The Bayes Theorem was developed by a British Mathematician Rev. Thomas Bayes. The probability given under Bayes theorem is also known by the name of inverse probability, posterior probability or revised probability. This theorem finds the probability of an event by considering the given sample information; hence the name posterior probability. The bayes theorem is based on the formula of conditional probability. conditional probability of event ${A_1}$ given event ${B}$ is ${P(A_1/B) = frac{P(A_1 and B)}{P(B)}}$ Similarly probability of event ${A_1}$ given event ${B}$ is ${P(A_2/B) = frac{P(A_2 and B)}{P(B)}}$ Where ${P(B) = P(A_1 and B) + P(A_2 and B) \[7pt] P(B) = P(A_1) times P (B/A_1) + P (A_2) times P (BA_2) }$ ${P(A_1/B)}$ can be rewritten as ${P(A_1/B) = frac{P(A_1) times P (B/A_1)}{P(A_1)} times P (B/A_1) + P (A_2) times P (BA_2)}$ Hence the general form of Bayes Theorem is ${P(A_i/B) = frac{P(A_i) times P (B/A_i)}{sum_{i=1}^k P(A_i) times P (B/A_i)}}$ Where ${A_1}$, ${A_2}$…${A_i}$…${A_n}$ are set of n mutually exclusive and exhaustive events. Print Page Previous Next Advertisements ”;

Residual sum of squares

Statistics – Residual Sum of Squares ”; Previous Next In statistics, the residual sum of squares (RSS), also known as the sum of squared residuals (SSR) or the sum of squared errors of prediction (SSE), is the sum of the squares of residuals (deviations of predicted from actual empirical values of data). Residual Sum of Squares (RSS) is defined and given by the following function: Formula ${RSS = sum_{i=0}^n(epsilon_i)^2 = sum_{i=0}^n(y_i – (alpha + beta x_i))^2}$ Where − ${X, Y}$ = set of values. ${alpha, beta}$ = constant of values. ${n}$ = set value of count Example Problem Statement: Consider two populace bunches, where X = 1,2,3,4 and Y = 4, 5, 6, 7, consistent worth ${alpha}$ = 1, ${beta}$ = 2. Locate the Residual Sum of Square (RSS) values of the two populace bunch. Solution: Given, ${X = 1,2,3,4 Y = 4,5,6,7 alpha = 1 beta = 2 }$ Arrangement: Substitute the given qualities in the recipe, Remaining Sum of Squares Formula ${RSS = sum_{i=0}^n(epsilon_i)^2 = sum_{i=0}^n(y_i – (alpha + beta x_i))^2, \[7pt] = sum(4-(1+(2x_1)))^2 + (5-(1+(2x_2)))^2 + (6-(1+(2x_3))^2 + (7-(1+(2x_4))^2, \[7pt] = sum(1)^2 + (0)^2 + (-1)^2 + (-2)^2, \[7pt] = 6 }$ Print Page Previous Next Advertisements ”;

Regression Intercept Confidence Interval

Statistics – Regression Intercept Confidence Interval ”; Previous Next Regression Intercept Confidence Interval, is a way to determine closeness of two factors and is used to check the reliability of estimation. Formula ${R = beta_0 pm t(1 – frac{alpha}{2}, n-k-1) times SE_{beta_0} }$ Where − ${beta_0}$ = Regression intercept. ${k}$ = Number of Predictors. ${n}$ = sample size. ${SE_{beta_0}}$ = Standard Error. ${alpha}$ = Percentage of Confidence Interval. ${t}$ = t-value. Example Problem Statement: Compute the Regression Intercept Confidence Interval of following data. Total number of predictors (k) are 1, regression intercept ${beta_0}$ as 5, sample size (n) as 10 and standard error ${SE_{beta_0}}$ as 0.15. Solution: Let us consider the case of 99% Confidence Interval. Step 1: Compute t-value where ${ alpha = 0.99}$. ${ = t(1 – frac{alpha}{2}, n-k-1) \[7pt] = t(1 – frac{0.99}{2}, 10-1-1) \[7pt] = t(0.005,8) \[7pt] = 3.3554 }$ Step 2: ${ge} $Regression intercept: ${ = beta_0 + t(1 – frac{alpha}{2}, n-k-1) times SE_{beta_0} \[7pt] = 5 – (3.3554 times 0.15) \[7pt] = 5 – 0.50331 \[7pt] = 4.49669 }$ Step 3: ${le} $Regression intercept: ${ = beta_0 – t(1 – frac{alpha}{2}, n-k-1) times SE_{beta_0} \[7pt] = 5 + (3.3554 times 0.15) \[7pt] = 5 + 0.50331 \[7pt] = 5.50331 }$ As a result, Regression Intercept Confidence Interval is ${4.49669}$ or ${5.50331}$ for 99% Confidence Interval. Print Page Previous Next Advertisements ”;

Probability

Statistics – Probability ”; Previous Next Probability Probability implies ”likelihood” or ”chance”. When an event is certain to happen then the probability of occurrence of that event is 1 and when it is certain that the event cannot happen then the probability of that event is 0. Hence the value of probability ranges from 0 to 1. Probability has been defined in a varied manner by various schools of thought. Some of which are discussed below. Classical Definition of Probability As the name suggests the classical approach to defining probability is the oldest approach. It states that if there are n exhaustive, mutually exclusive andequally likely cases out of which m cases are favourable to the happening ofevent A, Then the probabilities of event A is defined as given by the following probability function: Formula ${P(A) = frac{Number of favourable cases}{Total number of equally likely cases} = frac{m}{n}}$ Thus to calculate the probability we need information on number of favorable cases and total number of equally likely cases. This can he explained using following example. Example Problem Statement: A coin is tossed. What is the probability of getting a head? Solution: Total number of equally likely outcomes (n) = 2 (i.e. head or tail) Number of outcomes favorable to head (m) = 1 ${P(head) = frac{1}{2}}$ Print Page Previous Next Advertisements ”;

Sample planning

Statistics – Sample Planning ”; Previous Next Sample planning refers to a detailed outline of measurements to be taken: At what time – Decide the time when a survey is to be conducted. For example, taking people views on newspaper outreach before launch of a new newspaper in the area. On Which material – Decide the material on which the survey is to be conducted. It could be a online poll or paper based checklist. In what manner – Decide the sampling methods which will be used to choose people on whom the survey is to be conducted. By whom – Decide the person(s) who has to collect the observations. Sampling plans should be prepared in such a way that the result correctly represent the representative sample of interest and allows all questions to be answered. Steps Following are the steps involved in sample planning. Identification of parameters – Identify the attributes/ parameters to be measured. Identify the ranges, possible values and required resolution. Choose Sampling Method – Choose a sampling method with details like how and when samples are to be identified. Select Sample Size – Select an appropriate sample size to represent the population correctly. Large samples are generally proner to invalid conclusion. Select storage formats – Choose a data storage format in which the sampled data is to be kept. Assign Roles – Assign roles and responsibilities to each person involved in collecting, processing, statistically testing steps. Verify and execute – Sampling plan should be verifiable. Once verified, pass it to related parties to execute it. Print Page Previous Next Advertisements ”;

Statistical Significance

Statistics – Statistical Significance ”; Previous Next Statistical Significance signifies that result of a statistical experiment or testing is not occuring randomly and is attributable to certain cause. Statistical significance of a result could be strong or weak and it is very important for sectors which are heavily dependent on research works like insurance, pharma, finance, physics and so. Statistical Significance helps in choosing the sample data so that one can judge the result or outcome of testing to be realistic and not be caused by a random cause. Statisticians generally formulates the degree of statistical significance by sampling error. Generally sampling error of 5% is acceptable. Sample size is also important as it should be representative sample instead of very large sample considering the fact that large samples are prone to errors. Significance Level A level at which an event is considered to be statistical significant is termed as significance level. Statisticians uses a test statistic called p-value to get the statistical significance. If p-value of an event falls below a particular level then the event is considered as statistical significant. p-value is function of standard deviations and means of data samples. p-value is the probability of an event which certifies that result of statistical testing is occuring by chance or due to some sampling error. In other words it is the risk of failure of a statistical test. Opposite of p-value is confidence level which is 1 – p-value. If p-value of a result is 5% then that means confidence level of the result is 95%. Print Page Previous Next Advertisements ”;

Residual analysis

Statistics – Residual analysis ”; Previous Next Residual analysis is used to assess the appropriateness of a linear regression model by defining residuals and examining the residual plot graphs. Residual Residual($ e $) refers to the difference between observed value($ y $) vs predicted value ($ hat y $). Every data point have one residual. ${ residual = observedValue – predictedValue \[7pt] e = y – hat y }$ Residual Plot A residual plot is a graph in which residuals are on tthe vertical axis and the independent variable is on the horizontal axis. If the dots are randomly dispersed around the horizontal axis then a linear regression model is appropriate for the data; otherwise, choose a non-linear model. Types of Residual Plot Following example shows few patterns in residual plots. In first case, dots are randomly dispersed. So linear regression model is preferred. In Second and third case, dots are non-randomly dispersed and suggests that a non-linear regression method is preferred. Example Problem Statement: Check where a linear regression model is appropriate for the following data. $ x $ 60 70 80 85 95 $ y $ (Actual Value) 70 65 70 95 85 $ hat y $ (Predicted Value) 65.411 71.849 78.288 81.507 87.945 Solution: Step 1: Compute residuals for each data point. $ x $ 60 70 80 85 95 $ y $ (Actual Value) 70 65 70 95 85 $ hat y $ (Predicted Value) 65.411 71.849 78.288 81.507 87.945 $ e $ (Residual) 4.589 -6.849 -8.288 13.493 -2.945 Step 2: – Draw the residual plot graph. Step 3: – Check the randomness of the residuals. Here residual plot exibits a random pattern – First residual is positive, following two are negative, the fourth one is positive, and the last residual is negative. As pattern is quite random which indicates that a linear regression model is appropriate for the above data. Print Page Previous Next Advertisements ”;

Standard normal table

Statistics – Standard normal table ”; Previous Next Standard Normal Table Z is the standard normal random variable. The table value for Z is the value of the cumulative normal distribution at z. This is the left-tailed normal table. As z-value increases, the normal table value also increases. For example, the value for Z=1.96 is P (Z < 1.96) = .9750. z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09 0.0 .5000 .4960 .4920 .4880 .4840 .4801 .4761 .4721 .4681 .4641 0.1 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753 0.2 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141 0.3 .6179 .6217 .6255 .6293 .6331 .6368 .6406 .6443 .6480 .6517 0.4 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879 0.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224 0.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549 0.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852 0.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133 0.9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389 1.0 .8413 .8438 .8461 .8485 .8508 .8531 .8554 .8577 .8599 .8621 1.1 .8643 .8665 .8686 .8708 .8729 .8749 .8770 .8790 .8810 .8830 1.2 .8849 .8869 .8888 .8907 .8925 .8944 .8962 .8980 .8997 .9015 1.3 .9032 .9049 .9066 .9082 .9099 .9115 .9131 .9147 .9162 .9177 1.4 .9192 .9207 .9222 .9236 .9251 .9265 .9279 .9292 .9306 .9319 1.5 .9332 .9345 .9357 .9370 .9382 .9394 .9406 .9418 .9429 .9441 1.6 .9452 .9463 .9474 .9484 .9495 .9505 .9515 .9525 .9535 .9545 1.7 .9554 .9564 .9573 .9582 .9591 .9599 .9608 .9616 .9625 .9633 1.8 .9641 .9649 .9656 .9664 .9671 .9678 .9686 .9693 .9699 .9706 1.9 .9713 .9719 .9726 .9732 .9738 .9744 .9750 .9756 .9761 .9767 2.0 .9772 .9778 .9783 .9788 .9793 .9798 .9803 .9808 .9812 .9817 2.1 .9821 .9826 .9830 .9834 .9838 .9842 .9846 .9850 .9854 .9857 2.2 .9861 .9864 .9868 .9871 .9875 .9878 .9881 .9884 .9887 .9890 2.3 .9893 .9896 .9898 .9901 .9904 .9906 .9909 .9911 .9913 .9916 2.4 .9918 .9920 .9922 .9925 .9927 .9929 .9931 .9932 .9934 .9936 2.5 .9938 .9940 .9941 .9943 .9945 .9946 .9948 .9949 .9951 .9952 2.6 .9953 .9955 .9956 .9957 .9959 .9960 .9961 .9962 .9963 .9964 2.7 .9965 .9966 .9967 .9968 .9969 .9970 .9971 .9972 .9973 .9974 2.8 .9974 .9975 .9976 .9977 .9977 .9978 .9979 .9979 .9980 .9981 2.9 .9981 .9982 .9982 .9983 .9984 .9984 .9985 .9985 .9986 .9986 3.0 .9987 .9987 .9987 .9988 .9988 .9989 .9989 .9989 .9990 .9990 3.1 .9990 .9991 .9991 .9991 .9992 .9992 .9992 .9992 .9993 .9993 3.2 .9993 .9993 .9994 .9994 .9994 .9994 .9994 .9995 .9995 .9995 3.3 .9995 .9995 .9995 .9996 .9996 .9996 .9996 .9996 .9996 .9997 3.4 .9997 .9997 .9997 .9997 .9997 .9997 .9997 .9997 .9997 .9998 Print Page Previous Next Advertisements ”;