Big Data & Analytics Archives - Page 20 of 75 - Donotsad where can learn any thing work project and make money

Aug 10

Harmonic Number

Statistics – Harmonic Number ”; Previous Next Harmonic Number is the sum of the reciprocals of the first n natural numbers. It represents the phenomenon when the inductive reactance and the capacitive reactance of the power system becomes equal. Formula ${ H = frac{W_r}{W} \[7pt] , where W_r = sqrt{ frac{1}{LC}} } \[7pt] , and W = 2 pi f $ Where − ${f}$ = Harmonic resonance frequency. ${L}$ = inductance of the load. ${C}$ = capacitanc of the load. Example Calculate the harmonic number of a power system with the capcitance 5F, Inductance 6H and frequency 200Hz. Solution: Here capacitance, C is 5F. Inductance, L is 6H. Frequency, f is 200Hz. Using harmonic number formula, let”s compute the number as: ${ H = frac{sqrt{ frac{1}{LC}}}{2 pi f} \[7pt] implies H = frac{sqrt{ frac{1}{6 times 5}} }{2 times 3.14 times 200} \[7pt] , = frac{0.18257}{1256} \[7pt] , = 0.0001 }$ Thus harmonic number is $ { 0.0001 }$. Print Page Previous Next Advertisements ”;

Aug 10

Beta Distribution

Statistics – Beta Distribution ”; Previous Next The beta distribution represents continuous probability distribution parametrized by two positive shape parameters, $ alpha $ and $ beta $, which appear as exponents of the random variable x and control the shape of the distribution. Probability density function Probability density function of Beta distribution is given as: Formula ${ f(x) = frac{(x-a)^{alpha-1}(b-x)^{beta-1}}{B(alpha,beta) (b-a)^{alpha+beta-1}} hspace{.3in} a le x le b; alpha, beta > 0 \[7pt] , where B(alpha,beta) = int_{0}^{1} {t^{alpha-1}(1-t)^{beta-1}dt} }$ Where − ${ alpha, beta }$ = shape parameters. ${a, b}$ = upper and lower bounds. ${B(alpha,beta)}$ = Beta function. Standard Beta Distribution In case of having upper and lower bounds as 1 and 0, beta distribution is called the standard beta distribution. It is driven by following formula: Formula ${ f(x) = frac{x^{alpha-1}(1-x)^{beta-1}}{B(alpha,beta)} hspace{.3in} le x le 1; alpha, beta > 0}$ Cumulative distribution function Cumulative distribution function of Beta distribution is given as: Formula ${ F(x) = I_{x}(alpha,beta) = frac{int_{0}^{x}{t^{alpha-1}(1-t)^{beta-1}dt}}{B(alpha,beta)} hspace{.2in} 0 le x le 1; p, beta > 0 }$ Where − ${ alpha, beta }$ = shape parameters. ${a, b}$ = upper and lower bounds. ${B(alpha,beta)}$ = Beta function. It is also called incomplete beta function ratio. Print Page Previous Next Advertisements ”;

Aug 10

Cohen”s kappa coefficient

Statistics – Cohen”s kappa coefficient ”; Previous Next Cohen”s kappa coefficient is a statistic which measures inter-rater agreement for qualitative (categorical) items. It is generally thought to be a more robust measure than simple percent agreement calculation, since k takes into account the agreement occurring by chance. Cohen”s kappa measures the agreement between two raters who each classify N items into C mutually exclusive categories. Cohen”s kappa coefficient is defined and given by the following function − Formula ${k = frac{p_0 – p_e}{1-p_e} = 1 – frac{1-p_o}{1-p_e}}$ Where − ${p_0}$ = relative observed agreement among raters. ${p_e}$ = the hypothetical probability of chance agreement. ${p_0}$ and ${p_e}$ are computed using the observed data to calculate the probabilities of each observer randomly saying each category. If the raters are in complete agreement then ${k}$ = 1. If there is no agreement among the raters other than what would be expected by chance (as given by ${p_e}$), ${k}$ ≤ 0. Example Problem Statement − Suppose that you were analyzing data related to a group of 50 people applying for a grant. Each grant proposal was read by two readers and each reader either said “Yes” or “No” to the proposal. Suppose the disagreement count data were as follows, where A and B are readers, data on the diagonal slanting left shows the count of agreements and the data on the diagonal slanting right, disagreements − B Yes No A Yes 20 5 No 10 15 Calculate Cohen”s kappa coefficient. Solution − Note that there were 20 proposals that were granted by both reader A and reader B and 15 proposals that were rejected by both readers. Thus, the observed proportionate agreement is ${p_0 = frac{20+15}{50} = 0.70}$ To calculate ${p_e}$ (the probability of random agreement) we note that − Reader A said “Yes” to 25 applicants and “No” to 25 applicants. Thus reader A said “Yes” 50% of the time. Reader B said “Yes” to 30 applicants and “No” to 20 applicants. Thus reader B said “Yes” 60% of the time. Using formula P(A and B) = P(A) x P(B) where P is probability of event occuring. The probability that both of them would say “Yes” randomly is 0.50 x 0.60 = 0.30 and the probability that both of them would say “No” is 0.50 x 0.40 = 0.20. Thus the overall probability of random agreement is ${p_e}$ = 0.3 + 0.2 = 0.5. So now applying our formula for Cohen”s Kappa we get: ${k = frac{p_0 – p_e}{1-p_e} = frac{0.70 – 0.50}{1-0.50} = 0.40}$ Calculator Print Page Previous Next Advertisements ”;

Aug 10

Individual Series Arithmetic Mean

Individual Series Arithmetic Mean ”; Previous Next When data is given on individual basis. Following is an example of individual series − Items 5 10 20 30 40 50 60 70 For individual series, the Arithmetic Mean can be calculated using the following formula. Formula $bar{x} = sum_{i=1}^{n} X_{i}$ Alternatively, we can write same formula as follows − $bar{x} = frac{_{sum {x}}}{N}$ Where − $X_{1}, X_{2}, X_{3}, …. X_{n}$ = individual observation of variable. $sum {x}$ = sum of all observations of the variable ${N}$ = Number of observations Example Problem Statement − Calculate Arithmetic Mean for the following individual data − Items 14 36 45 70 105 Solution − Based on the above mentioned formula, Arithmetic Mean $bar{x}$ will be − $bar{x} = frac{14 + 36 + 45 + 70 + 105}{5} \[7pt] , = frac{270}{5} \[7pt] , = {54}$ The Arithmetic Mean of the given numbers is 54. Calculator Print Page Previous Next Advertisements ”;

Aug 10

Adjusted R-Squared

Statistics – Adjusted R-Squared ”; Previous Next R-squared measures the proportion of the variation in your dependent variable (Y) explained by your independent variables (X) for a linear regression model. Adjusted R-squared adjusts the statistic based on the number of independent variables in the model.${R^2}$ shows how well terms (data points) fit a curve or line. Adjusted ${R^2}$ also indicates how well terms fit a curve or line, but adjusts for the number of terms in a model. If you add more and more useless variables to a model, adjusted r-squared will decrease. If you add more useful variables, adjusted r-squared will increase. Adjusted ${R_{adj}^2}$ will always be less than or equal to ${R^2}$. You only need ${R^2}$ when working with samples. In other words, ${R^2}$ isn”t necessary when you have data from an entire population. Formula ${R_{adj}^2 = 1 – [frac{(1-R^2)(n-1)}{n-k-1}]}$ Where − ${n}$ = the number of points in your data sample. ${k}$ = the number of independent regressors, i.e. the number of variables in your model, excluding the constant. Example Problem Statement − A fund has a sample R-squared value close to 0.5 and it is doubtlessly offering higher risk adjusted returns with the sample size of 50 for 5 predictors. Find Adjusted R square value. Solution − Sample size = 50 Number of predictor = 5 Sample R – square = 0.5.Substitute the qualities in the equation, $ {R_{adj}^2 = 1 – [frac{(1-0.5^2)(50-1)}{50-5-1}] \[7pt] , = 1 – (0.75) times frac{49}{44} , \[7pt] , = 1 – 0.8352 , \[7pt] , = 0.1648 }$ Calculator Print Page Previous Next Advertisements ”;

Aug 10

Chebyshev”s Theorem

Statistics – Chebyshev”s Theorem ”; Previous Next The fraction of any set of numbers lying within k standard deviations of those numbers of the mean of those numbers is at least ${1-frac{1}{k^2}}$ Where − ${k = frac{the within number}{the standard deviation}}$ and ${k}$ must be greater than 1 Example Problem Statement − Use Chebyshev”s theorem to find what percent of the values will fall between 123 and 179 for a data set with mean of 151 and standard deviation of 14. Solution − We subtract 151-123 and get 28, which tells us that 123 is 28 units below the mean. We subtract 179-151 and also get 28, which tells us that 151 is 28 units above the mean. Those two together tell us that the values between 123 and 179 are all within 28 units of the mean. Therefore the “within number” is 28. So we find the number of standard deviations, k, which the “within number”, 28, amounts to by dividing it by the standard deviation − ${k = frac{the within number}{the standard deviation} = frac{28}{14} = 2}$ So now we know that the values between 123 and 179 are all within 28 units of the mean, which is the same as within k=2 standard deviations of the mean. Now, since k > 1 we can use Chebyshev”s formula to find the fraction of the data that are within k=2 standard deviations of the mean. Substituting k=2 we have − ${1-frac{1}{k^2} = 1-frac{1}{2^2} = 1-frac{1}{4} = frac{3}{4}}$ So ${frac{3}{4}}$ of the data lie between 123 and 179. And since ${frac{3}{4} = 75}$% that implies that 75% of the data values are between 123 and 179. Calculator Print Page Previous Next Advertisements ”;

Aug 10

Splunk – Basic Chart

Splunk – Basic Chart ”; Previous Next Splunk has great visualization features which shows a variety of charts. These charts are created from the results of a search query where appropriate functions are used to give numerical outputs. For example, if we look for the average file size in bytes from the data set named web_applications, we can see the result in the statistics tab as shown below − Creating Charts In order to create a basic chart, we first ensure that the data is visible in the statistics tab as shown above. Then we click on the Visualization tab to get the corresponding chart. The above data produces a pie chart by default as shown below. Changing the Chart Type We can change the chart type by selecting a different chart option from the chart name. Clicking on one of these options will produce the chart for that type of graph. Formatting a Chart The charts can also be formatted by using the Format option. This option allows to set the values for the axes, set the legends or show the data values in the chart. In the below example, we have chosen the horizontal chart and selected the option to show the data values as a Format option. Print Page Previous Next Advertisements ”;

Aug 10

Splunk – Lookups

Splunk – Lookups ”; Previous Next In the result of a search query, we sometimes get values which may not clearly convey the meaning of the field. For example, we may get a field which lists the value of product id as a numeric result. These numbers will not give us any idea of what kind of product it is. But if we list the product name along with the product id, that gives us a good report where we understand the meaning of the search result. Such linking of values of one field to a field with same name in another dataset using equal values from both the data sets is called a lookup process. The advantage is, we retrieve the related values from two different data sets. Steps to Create and Use Lookup File In order to successfully create a lookup field in a dataset, we need to follow the below steps − Create Lookup File We consider the dataset with host as web_application, and look at the productid field. This field is just a number, but we want product names to be reflected in our query result set. We create a lookup file with the following details. Here, we have kept the name of the first field as productid which is same as the field we are going to use from the dataset. productId,productdescription WC-SH-G04,Tablets DB-SG-G01,PCs DC-SG-G02,MobilePhones SC-MG-G10,Wearables WSC-MG-G10,Usb Light GT-SC-G01,Battery SF-BVS-G01,Hard Drive Add the Lookup File Next, we add the lookup file to Splunk environment by using the Settings screens as shown below − After selecting the Lookups, we are presented with a screen to create and configure lookup. We select lookup table files as shown below. We browse to select the file productidvals.csv as our lookup file to be uploaded and select search as our destination app. We also keep the same destination file name. On clicking the save button, the file gets saved to the Splunk repository as a lookup file. Create Lookup Definitions For a search query to be able to lookup values from the Lookup file we just uploaded above, we need to create a lookup definition. We do this by again going to Settings → Lookups → Lookup Definition → Add New . Next, we check the availability of the lookup definition we added by going to Settings → Lookups → Lookup Definition . Selecting Lookup Field Next, we need to select the lookup field for our search query. This is done my going to New search → All Fields . Then check the box for productid which will automatically add the productdescription field from the lookup file also. Using the Lookup Field Now we use the Lookup field in the search query as shown below. The visualization shows the result with productdescription field instead of productid. Print Page Previous Next Advertisements ”;

Aug 10

Splunk – Knowledge Management

Splunk – Knowledge Management ”; Previous Next Splunk knowledge management is about maintenance of knowledge objects for a Splunk Enterprise implementation. Below are the main features of knowledge management − Ensure that knowledge objects are being shared and used by the right groups of people in the organization. Normalize event data by implementing knowledge object naming conventions and retiring duplicate or obsolete objects. Oversee strategies for improved search and pivot performance (report acceleration, data model acceleration, summary indexing, batch mode search). Build data models for Pivot users. Knowledge Object It is a Splunk object to get specific information about your data. When you create a knowledge object, you can keep it private or you can share it with other users. The examples of knowledge object are: saved searches, tags, field extractions, lookups, etc. Uses of Knowledge Objects On using the Splunk software, the knowledge objects are created and saved. But they may contain duplicate information, or they may not be used effectively by all the intended audience. To address such issues, we need to manage these objects. This is done by classifying them properly and then using proper permission management to handle them. Below are the uses and classification of various knowledge objects − Fields and field extractions Fields and field extractions is the first layer of Splunk software knowledge. The fields automatically extracted from the Splunk software from the IT data help bring meaning to the raw data. The manually extracted fields expand and improve upon this layer of meaning. Event types and transactions Use event types and transactions to group together interesting sets of similar events. Event types group together sets of events discovered through searches. Transactions are collections of conceptually-related events that span time. Lookups and workflow actions Lookups and workflow actions are categories of knowledge objects that extend the usefulness of your data in various ways. Field lookups enable you to add fields to your data from external data sources such as static tables (CSV files) or Python-based commands. Workflow actions enable interactions between fields in your data and other applications or web resources, such as a WHOIS lookup on a field containing an IP address. Tags and aliases Tags and aliases are used to manage and normalize sets of field information. You can use tags and aliases to group sets of related field values together, and to give extracted field tags that reflect different aspects of their identity. For example, you can group events from set of hosts in a particular location (such as a building or city) together by giving the same tag to each host. If you have two different sources using different field names to refer to same data, then you can normalize your data by using aliases (by aliasing clientip to ipaddress, for example). Data models Data models are representations of one or more datasets, and they drive the Pivot tool, enabling Pivot users to quickly generate useful tables, complex visualizations, and robust reports without needing to interact with the Splunk software search language. Data models are designed by knowledge managers who fully understand the format and semantics of their indexed data. A typical data model makes use of other knowledge object types. We will discuss some of the examples of these knowledge objects in the subsequent chapters. Print Page Previous Next Advertisements ”;

Aug 10

Splunk – Event Types

Splunk – Event Types ”; Previous Next In Splunk search, we can design our own events from a dataset based on certain criteria. For example, we search for only the events which have a http status code of 200. This event now can be saved as an event type with a user defined name as status200 and use this event name as part of future searches. In short, an event type represents a search that returns a specific type of event or a useful collection of events. Every event that can be returned by the search gets an association with that event type. Creating Event Type There are two ways to create an event type after we have decided the search criteria. One is to run a search and then save it as an Event Type. Another is to add a new Event Type from the settings tab. We will see both the ways of creating it in this section. Using a Search Consider the search for the events which have the criteria of successful http status value of 200 and the event type run on a Wednesday. After running the search query, we can choose Save As option to save the query as an Event Type. The next screen prompts to give a name for the Event Type, choose a Tag which is optional and then choose a colour with which the events will be highlighted. The priority option decides which event type will be displayed first in case two or more event types match the same event. Finally, we can see the Event Type has been created by going to the Settings → Event Types option. Using New Event Type The other option to create a new Event Type is to use the Settings → Event Types option as shown below where we can add a new Event Type − On clicking the button New Event Type we get the following screen to add the same query as in the previous section. Viewing the Event Type To view the event we just created above, we can write the below search query in the search box and we can see the resulting events along with the colour we have chosen for the event type. Using the Event Type We can use the Event type along with other queries. Here we specify some partial criteria from the Event Type and the result is a mix of events which shows the coloured and non-coloured events in the result. Print Page Previous Next Advertisements ”;