Big Data & Analytics Archives - Page 21 of 75 - Donotsad where can learn any thing work project and make money

Aug 10

Splunk – Event Types

Splunk – Event Types ”; Previous Next In Splunk search, we can design our own events from a dataset based on certain criteria. For example, we search for only the events which have a http status code of 200. This event now can be saved as an event type with a user defined name as status200 and use this event name as part of future searches. In short, an event type represents a search that returns a specific type of event or a useful collection of events. Every event that can be returned by the search gets an association with that event type. Creating Event Type There are two ways to create an event type after we have decided the search criteria. One is to run a search and then save it as an Event Type. Another is to add a new Event Type from the settings tab. We will see both the ways of creating it in this section. Using a Search Consider the search for the events which have the criteria of successful http status value of 200 and the event type run on a Wednesday. After running the search query, we can choose Save As option to save the query as an Event Type. The next screen prompts to give a name for the Event Type, choose a Tag which is optional and then choose a colour with which the events will be highlighted. The priority option decides which event type will be displayed first in case two or more event types match the same event. Finally, we can see the Event Type has been created by going to the Settings → Event Types option. Using New Event Type The other option to create a new Event Type is to use the Settings → Event Types option as shown below where we can add a new Event Type − On clicking the button New Event Type we get the following screen to add the same query as in the previous section. Viewing the Event Type To view the event we just created above, we can write the below search query in the search box and we can see the resulting events along with the colour we have chosen for the event type. Using the Event Type We can use the Event type along with other queries. Here we specify some partial criteria from the Event Type and the result is a mix of events which shows the coloured and non-coloured events in the result. Print Page Previous Next Advertisements ”;

Aug 10

Dot Plot

Statistics – Dot Plot ”; Previous Next A dot chart or dot plot is a statistical chart consisting of data points plotted on a fairly simple scale, typically using filled in circles. Example Problem Statement: A study of “To what extent does it take you to have breakfast?” has these outcomes: Minutes 0 1 2 3 4 5 6 7 8 9 10 11 12 People 6 2 3 5 2 5 0 0 2 3 7 4 1 Draw the Dot Plot for Minutes to Eat Breakfast! Solution: 6 individuals take 0 minutes to have breakfast (they most likely had no breakfast!), 2 individuals say they just burn through 1 moment eating, and so on. And here is the dot plot: Print Page Previous Next Advertisements ”;

Aug 10

Binomial Distribution

Statistics – Binomial Distribution ”; Previous Next Bionominal appropriation is a discrete likelihood conveyance. This distribution was discovered by a Swiss Mathematician James Bernoulli. It is used in such situation where an experiment results in two possibilities – success and failure. Binomial distribution is a discrete probability distribution which expresses the probability of one set of two alternatives-successes (p) and failure (q). Binomial distribution is defined and given by the following probability function − Formula ${P(X-x)} = ^{n}{C_x}{Q^{n-x}}.{p^x}$ Where − ${p}$ = Probability of success. ${q}$ = Probability of failure = ${1-p}$. ${n}$ = Number of trials. ${P(X-x)}$ = Probability of x successes in n trials. Example Problem Statement − Eight coins are tossed at the same time. Discover the likelihood of getting no less than 6 heads. Solution − Let ${p}$=probability of getting a head. ${q}$=probability of getting a tail. $ Here,{p}=frac{1}{2}, {q}= frac{1}{2}, {n}={8}, \[7pt] {P(X-x)} = ^{n}{C_x}{Q^{n-x}}.{p^x} , \[7pt] ,{P (at least 6 heads)} = {P(6H)} +{P(7H)} +{P(8H)}, \[7pt] , ^{8}{C_6}{{(frac{1}{2})}^2}{{(frac{1}{2})}^6} + ^{8}{C_7}{{(frac{1}{2})}^1}{{(frac{1}{2})}^7} +^{8}{C_8}{{(frac{1}{2})}^8}, \[7pt] , = 28 times frac{1}{256} + 8 times frac{1}{256} + 1 times frac{1}{256}, \[7pt] , = frac{37}{256}$ Calculator Print Page Previous Next Advertisements ”;

Aug 10

Continuous Series Arithmetic Mean

Statistics – Continuous Series Arithmetic Mean ”; Previous Next When data is given based on ranges alongwith their frequencies. Following is an example of continous series: Items 0-5 5-10 10-20 20-30 30-40 Frequency 2 5 1 3 12 In case of continous series, a mid point is computed as $frac{lower-limit + upper-limit}{2}$ and Arithmetic Mean is computed using following formula. Formula $bar{x} = frac{f_1m_1 + f_2m_2 + f_3m_3……..+ f_nm_n}{N}$ Where − ${N}$ = Number of observations. ${f_1,f_2,f_3,…,f_n}$ = Different values of frequency f. ${m_1,m_2,m_3,…,m_n}$ = Different values of mid points for ranges. Example Problem Statement − Let”s calculate Arithmetic Mean for the following continous data − Items 0-10 10-20 20-30 30-40 Frequency 2 5 1 3 Solution − Based on the given data, we have − Items Mid-ptm Frequencyf ${fm}$ 0-10 5 2 10 10-20 15 5 75 20-30 25 1 25 30-40 35 3 105 ${N=11}$ ${sum fm=215}$ Based on the above mentioned formula, Arithmetic Mean $bar{x}$ will be − $bar{x} = frac{215}{11} \[7pt] , = {19.54}$ The Arithmetic Mean of the given numbers is 19.54. Calculator Print Page Previous Next Advertisements ”;

Aug 10

Boxplots

Statistics – Boxplots ”; Previous Next The box plot is a standardized way to display the distribution of data based on following five number summary. Minimum First Quartile Median Third Quartile Maximum For a uniformly distributed data set,in box plot diagram, the central rectangle spans the first quartile to the third quartile (or the interquartile range, IQR). A line inside the rectangle shows the median and “whiskers” above and below the box show the locations of the minimum and maximum values. Such box plot displays the full range of variation from min to max, the likely range of variation, the IQR, and the median. Problem Statement: Create a box plot for the following two datasets. 0.22 -0.87 -2.39 -1.79 0.37 -1.54 1.28 -0.31 -0.74 1.72 0.38 -0.17 -0.62 -1.10 0.30 0.15 2.30 0.19 -0.50 -0.09 -5.13 -2.19 -2.43 -3.83 0.50 -3.25 4.32 1.63 5.18 -0.43 7.11 4.87 -3.10 -5.81 3.76 6.31 2.58 0.07 5.76 3.50 Solution: Here both datasets are uniformly balanced around zero so mean is around zero. In first data set variation ranges approximately from -2.5 to 2.5 whereas in second data set ranges approximately from -6 to 6. Draw the chart as shown below: Print Page Previous Next Advertisements ”;

Aug 10

Data collection – Questionaire Designing

Statistics – Data collection – Questionaire Designing ”; Previous Next A questionnaire is a form containing a set of questions, which are filled by the respondents. According to Goode Hatt, “In general, the questionnaire refers to a device for securing answers to questions by using a form which the respondent fills in himself.” The objective of a questionnaire is twofold: To collect information from respondent scattered in a wide area. To achieve success in collecting reliable and dependable information in a short span of time. QUESTIONNAIRE DESIGN Designing of questionnaire is an art rather than a science. It is an effort of compiling a set of questions by repeatedly checking out for errors, of learning what to avoid and what to include. However a basic design strategy can be e developed. The designing of questionnaire general1y goes through three phases Developing a design strategy Constructing the questionnaire Drafting and refining the questionnaire. Phase I: Developing A Design, Strategy Specify the Information Sought – The researcher should be able to specify the list of information needs. Generally this task has already been accomplished when the research proposal or the research design was developed. The hypothesis stated earlier is the guiding light in stating the information requirement. The hypothesis establishes the relationship between the variables and the researcher can ideally develop the data that is required to he collected to prove or disapprove the hypothesis. Determine the Communication Approach – It refers to the decision on the method used to conduct the survey i.e. personal interview, depth interview, telephone, mail. computer etc. This decision on method to be used will have a bearing on the type of questionnaire to design. The choice of communication approach is influenced by factors like to location of respondents, the time funds available, nature of study etc. The communication approach chosen results in different introductions, different instructions, layout etc. Once the communication approach has been finalized, a decision is then taken on the type of questionnaire that is to be framed. Type of Questionnaire – In this step the researcher specifies how the data will be gathered by stating the type of questionnaire required. The questionnaire can be of four types. Structured-Undisguised Questionnaire – The most popular type, it involves using questions with clear direct wording, having a logical order. The wording and order remains the same for all the respondents. They are very simple to administer and easy to tabulate. Unstructured-Disguised Questionnaire – The exact opposite of the earlier type, this questionnaire hides the purpose of research and shows no clear order or tendencies. Such a questionnaire generally uses projective methods to collect data. A disguised or hidden stimulus is given to the respondent and the response is in an unstructured form. Unstructured-Undisguised Questionnaire – In this type of questionnaire the purpose of the study is clear but the questions are generally open ended. e.g. “How do you feel about putting a ban on student union election?” The respondents are free to reply in an unstructured manner. These questionnaires are generally used in depth interviews. Structured-Disguised Questionnaire. – The purpose of this questionnaire is to hide the motive of study but allow for ease in coding and analysis. This approach is based on the fact that direct questions may influence or bias the replies, but if the questions. are disguised than we ask the respondents What they know and not what they feel e.g. the earlier question will be framed as What is the effect of student union election? (a) It creates awareness (b) It disrupts studies (C)……………………….. (d)……………………….. Although such questionnaire offer ease of tabulation and analysis, yet because of the effort involved in framing disguised questions, this is not a every popular method. Phase II: Constructing the Questionnaire Determine Question Content – This step initiates the task of framing specific questions which would yield the data required for study. While framing the questions certain things should be kept in mind: Is the question necessary? Every question should have some use in providing additional and genuine information. Is the question complete? The question should have the proper scope to reveal all the information that a researcher needs to know. Is a single question or multiple questions required? There should not be ”double barreled questions” which combine two questions in one e.g. ”Are the elections this year transparent and according to election commission guidelines”. This is an incorrect method. Instead to obtain the desired information the following two questions should have been asked : Are the elections this year transparent? Where the election commission guidelines adhered to completely. Can the respondent articulate? The respondent may be unable to answer adequately due to his inability to organize his thoughts. Is the respondent informed? The respondent”s information level should be kept in mind i.e. the content of the question should match the knowledge level of the respondent. Can the respondent remember? The questions should not overtax the respondent s recall ability. No assumption should he made regarding the memory. Take a simple test and answer these questions: What was the last movie you saw? Where did you last eat out? When did you visit a temple? These questions, although very simple, yet test your recall ability. Is the respondent willing to answer? This is of relevance in situation where the questions are sensitive exploring an individual”s faith, money matters, family life etc. Determine the Response StrategyOnce the content of questions has been decided upon, the next stage is to decide on the structured response strategy. (close response using fixed alternative questions) or an unstructured response strategy open response using open ended questions). Some of the response strategies are: Dichotomous questions Do you own a digital camera? YesNo Multichotomous questions Which brand do you prefer for buy a digital camera? Sony Cannon Nikon Kodak Checklist questions What features do you look for in your digital camera? Picture clarity Screen size Video recording facility Economical Smart physical looks Free service for 1 year Large memory capacity

Aug 10

Splunk – Dashboards

Splunk – Dashboards ”; Previous Next A dashboard is used to represent tables or charts which are related to some business meaning. It is done through panels. The panels in a dashboard hold the chart or summarized data in a visually appealing manner. We can add multiple panels, and hence multiple reports and charts to the same dashboard. Creating Dashboard We will continue with the search query from the previous chapter which shows the count of files by week days. We choose the Visualization tab to see the result as a pie chart. To put the chart on a dashboard, we can choose the option Save As → Dashboard Panel as shown below. The next screen will ask for fillings the details of the dashboard and the panel in it. We fill the screen with details as shown below. On clicking on Save button, the next screen gives an option to view dashboard. On choosing to view dashboard, we get the following output where we can see the dashboard and options to edit, export or delete. Adding Panel to Dashboard We can add a second chart to the dashboard by adding a new panel containing the chart. Below is the bar chart and its query which we are going to add to the above dashboard. Next, we fill up the details for the second chart and click Save as shown in the below image − Finally, we get the dashboard which contains both the charts in two different panels. As you can see in the image below, we can edit the dashboard to add more panels and you can add more input elements: Text, Radio and Dropdown buttons to create more sophisticated dashboards. Print Page Previous Next Advertisements ”;

Aug 10

Splunk – Tags

Splunk – Tags ”; Previous Next Tags are used to assign names to specific field and value combinations. These fields can be event type, host, source, or source type, etc. You can also use a tag to group a set of field values together, so that you can search for them with one command. For example, you can tag all the different files generated on Monday to a tag named mon_files. To find the field-value pair which we are going to tag, we need to expand the events and locate the field to be considered. The below image shows how we can expand an event to see the fields − Creating Tags We can create tags by adding the tag value to field-value pair using Edit Tags option as shown below. We choose the field under the Actions column. The next screen prompts us to define the tag. For the Status field, we choose the status value of 503 or 505 and assign a tag named server_error as shown below. We have to do it one by one by choosing two events, each with the events with status value 503 and 505. The image below shows the method for status value as 503. We have to repeat the same steps for an event with status value as 505. Searching Using Tags Once the tags are created, we can search for events containing the Tag by simply writing the Tag name in the search bar. In the below image, we see all the events which have status: 503 or 505. Print Page Previous Next Advertisements ”;

Aug 10

Data collection

Statistics – Data Collection ”; Previous Next The data required for a research can be primary or secondary in nature. Primary data, by definition is the date that has been collected first hand by the researcher specially for addressing the population at hand. A survey research can be objectivist or subjectivist in nature. An objectivist approach is a more rigid and scientific approach. In this the hypothesis is tested using publicly standard procedure. There is little or no latitude available to deviate from the stated procedures or questions. A subjectivist approach, requires a hypothesis test, but is not that rigid in following the procedures. The researcher is allowed to use unstructured methods, at his discretion, to record data. The research data can be classified as follows: Interview A form of communication approach to collecting data from respondent”s interview is to oral or verbal questioning. Bingham and Moore have described interview as ”conversation with a purpose.” Lindsey Gardner, has defined interview as a ”two-person conversation, initiated by the interviewer for the specific purpose of obtaining research-relevant information and focused by him on the content specified by the research objectives of description and explanation. It is thus clear that interview is a verbal conversation between two people with the objective of collecting research relevant information from the respondent. Interview can be classified into various types” viz., personal interview, telephone interview, focus group interview, depth interview and projective techniques also called as indirect interviewing. Type of Interview The interview techniques can be grouped in the following categories: Personal Interview A Personal interview is a face to face way communication between the interviewer and the respondent. Generally the personal interview is carried out in a planned manner and is referred to as ”structured interview”. The personal interviews can be conducted in many forms e.g. door to door interviewing where the respondents are interviewed in their home, or as planned formal executive meeting, most commonly used to interview officials and business persons, or as a mall intercept survey where respondents are interviewed at select places where the chances of finding respondents is maximum. Method of Conducting an Interview A personal interview involves a lot of preparation. Generally an interview should go through the following stages. Rapport Building – The first reaction of a respondent on being asked to give interview is to say ”No”. Hence in the initial stage the interviewer should increase the receptiveness of the respondent by making him believe that his opinions are very useful to the research, and the interview is going to be a pleasure rather than an ordeal. It is important that the interviewer should convey his confidence to the respondent and satisfy his mental reservations if any. Wherever possible an appointment should be sought. Introduction – An introduction involves the interviewer identifying himself by giving him his name, purpose and sponsorship if any. An introductory letter goes a long way in conveying the study”s legitimacy. If the respondent is unavailable, then the interviewer should ensure that he seeks a reappoint. Probing – In this stage the interviewer collects data by asking questions from an interview schedule which contains questions in prearranged sequence. Generally the questions should be asked the way they are worded in order to avoid bias, but if they are not understood or heard properly they may be repealed. An advantage of interview is that it allows for probing. Probing is the technique of encouraging the respondents to answer freely, completely and relevantly. Some of the frequently used probing styles are use of comments like “I understand”, “Uh-huh”, repeating the respondents reply to incite him to rethink his reply, give an expectant out pause to convey interest etc. However probing should be used carefully and should not bias the respondent”s reply. Recording – The last stage in an interview, is recording responses. The interviewer ”can either write the response at the time of interview or after the interview. Normally, the recording should take place side by side. The interviewer can use short hand and abbreviate responses. Recording response later on has the disadvantage that one way forgets what had been said earlier. In certain cases, where the respondent allows for it, audio or visual aids can be used to record answers. Closing – After the interview is over, the interviewer should thank the respondent and once again assure him about the worth of his answers and the confidentiality of the same. Telephone interview In telephone interviewing the information is collected from the respondent by asking the questions on phone. The marriage of telephone and computer has made this method even more popular. Traditional telephone interviews call for phoning a sample of respondents, asking them questions written on a paper and recording them with a pencil. In case of Computer Assisted Telephone Interviewing (CATI) uses a computerized questionnaire is used which prompts the interviewer with introductory statements, and qualifying questions to be asked to the respondents. The computer replaces the paper and pen. The computer randomly dials a number from the sample; upon contact the interviewer reads the questions and diret1y into the computer”s memory bank. Another variant is ”Computer Administered Telephone Surveys (CATS) where the interviewer is replaced by computer. The questions are voice synthesized and the respondent”s answer and computer timing decide whether to continue or disconnect. The chief disadvantage of this method is that the rejection of this mode of data collection affects the refusal rate (and thus non-response bias) because people hang up more easily on a computer than on a human Telephone interview has the following advantages over personal interview: Low cost Faster collection of data Reduces biases (caused by personal presence of interviewer) Focus group interview A kind of unstructured interview. It involves a moderator leading discussion between small groups of respondents on a specified topic. A focus group interview involves 8 to 12 respondents having homogeneous characteristics, seated in a comfortable relaxed informal atmosphere. The interview generally continues for 1 to 3 hour during which the

Aug 10

Splunk – Source Types

Splunk – Source Types ”; Previous Next All the incoming data to Splunk are first judged by its inbuilt data processing unit and classified to certain data types and categories. For example, if it is a log from apache web server, Splunk is able to recognize that and create appropriate fields out of the data read. This feature in Splunk is called source type detection and it uses its built-in source types that are known as “pretrained” source types to achieve this. This makes things easier for analysis as the user does not have to manually classify the data and assign any data types to the fields of the incoming data. Supported Source Types The supported source types in Splunk can be seen by uploading a file through the Add Data feature and then selecting the dropdown for Source Type. In the below image, we have uploaded a CSV file and then checked for all the available options. Source Type Sub-Category Even in those categories, we can further click to see all the sub categories that are supported. So when you choose the database category, you can find the different types of databases and their supported files which Splunk can recognize. Pre-Trained Source Types The below table lists some of the important pre-trained source types Splunk recognizes − Source Type Name Nature access_combined NCSA combined format http web server logs (can be generated by apache or other web servers) access_combined_wcookie NCSA combined format http web server logs (can be generated by apache or other web servers), with cookie field added at end apache_error Standard Apache web server error log linux_messages_syslog Standard linux syslog (/var/log/messages on most platforms) log4j Log4j standard output produced by any J2EE server using log4j mysqld_error Standard mysql error log Print Page Previous Next Advertisements ”;