Big Data & Analytics Archives - Page 22 of 75 - Donotsad where can learn any thing work project and make money

Aug 10

Circular Permutation

Statistics – Circular Permutation ”; Previous Next Circular permutation is the total number of ways in which n distinct objects can be arranged around a fix circle. It is of two types. Case 1 − Clockwise and Anticlockwise orders are different. Case 2 − Clockwise and Anticlockwise orders are same. Case 1 − Formula ${P_n = (n-1)!}$ Where − ${P_n}$ = represents circular permutation ${n}$ = Number of objects Case 2 − Formula ${P_n = frac{n-1!}{2!}}$ Where − ${P_n}$ = represents circular permutation ${n}$ = Number of objects Example Problem Statement Calculate circular permulation of 4 persons sitting around a round table considering i) Clockwise and Anticlockwise orders as different and ii) Clockwise and Anticlockwise orders as same. Solution In Case 1, n = 4, Using formula ${P_n = (n-1)!}$ Apply the formula ${P_4 = (4-1)! \[7pt] = 3! \[7pt] = 6 }$ In Case 2, n = 4, Using formula ${P_n = frac{n-1!}{2!}}$ Apply the formula ${P_4 = frac{n-1!}{2!} \[7pt] = frac{4-1!}{2!} \[7pt] = frac{3!}{2!} \[7pt] = frac{6}{2} \[7pt] = 3 }$ Calculator Print Page Previous Next Advertisements ”;

Aug 10

Splunk – Data Ingestion

Splunk – Data Ingestion ”; Previous Next Data ingestion in Splunk happens through the Add Data feature which is part of the search and reporting app. After logging in, the Splunk interface home screen shows the Add Data icon as shown below. On clicking this button, we are presented with the screen to select the source and format of the data we plan to push to Splunk for analysis. Gathering The Data We can get the data for analysis from the Official Website of Splunk. Save this file and unzip it in your local drive. On opening the folder, you can find three files which have different formats. They are the log data generated by some web apps. We can also gather another set of data provided by Splunk which is available at from the Official Splunk webpage. We will use data from both these sets for understanding the working of various features of Splunk. Uploading data Next, we choose the file, secure.log from the folder, mailsv which we have kept in our local system as mentioned in the previous paragraph. After selecting the file, we move to next step using the green coloured next button in the top right corner. Selecting Source Type Splunk has an in-built feature to detect the type of the data being ingested. It also gives the user an option to choose a different data type than the chosen by Splunk. On clicking the source type drop down, we can see various data types that Splunk can ingest and enable for searching. In the current example given below, we choose the default source type. Input Settings In this step of data ingestion, we configure the host name from which the data is being ingested. Following are the options to choose from, for the host name − Constant value It is the complete host name where the source data resides. regex on path When you want to extract the host name with a regular expression. Then enter the regex for the host you want to extract in the Regular expression field. segment in path When you want to extract the host name from a segment in your data source”s path, enter the segment number in the Segment number field. For example, if the path to the source is /var/log/ and you want the third segment (the host server name) to be the host value, enter “3”. Next, we choose the index type to be created on the input data for searching. We choose the default index strategy. The summary index only creates summary of the data through aggregation and creates index on it while the history index is for storing the search history. It is clearly depicted in the image below − Review Settings After clicking on the next button, we see a summary of the settings we have chosen. We review it and choose Next to finish the uploading of data. On finishing the load, the below screen appears which shows the successful data ingestion and further possible actions we can take on the data. Print Page Previous Next Advertisements ”;

Aug 10

Black-Scholes model

Statistics – Black-Scholes model ”; Previous Next The Black Scholes model is a mathematical model to check price variation over time of financial instruments such as stocks which can be used to compute the price of a European call option. This model assumes that the price of assets which are heavily traded follows a geometric Brownian motion having a constant drift and volatility. In case of stock option, Black Scholes model incorporates the constant price variation of the underlying stock, the time value of money, strike price of the option and its time to expiry. The Black Scholes Model was developed in 1973 by Fisher Black, Robert Merton and Myron Scholes and is still widely used in euporian financial markets. It provides one of the best way to determine fair prices of options. Inputs The Black Scholes model requires five inputs. Strike price of an option Current stock price Time to expiry Risk-free rate Volatility Assumptions The Black Scholes model assumes following points. Stock prices follow a lognormal distribution. Asset prices cannot be negative. No transaction cost or tax. Risk-free interest rate is constant for all maturities. Short selling of securities with use of proceeds is permitted. No riskless arbitrage opportunity present. Formula ${ C = SN(d_1) – Ke^{-rT}Nd_2 \[7pt] , P = Ke^{-rT}N(-d_2) – SN(-d_1) \[7pt] , where \[7pt] , d_1 = frac{1}{{sigma sqrt T}} [ln(frac{S}{K}) + (r + frac{sigma^2}{2}T)] \[7pt] , d_2 = d_1 – sigma sqrt T }$ Where − ${C}$ = Value of Call Option. ${P}$ = Value of Put Option. ${S}$ = Stock Price. ${K}$ = Strike Price. ${r}$ = Risk free interest rate. ${T}$ = Time to maturity. ${sigma}$ = Annualized volatility. Limitations The Black Scholes model have following limitations. Only applicable to European options as American options could be exercised before their expiry. Constant dividend and constant risk free rates may not be relistic. Volatility may fluctuate with the level of supply and demand of option thus being constant may not be true. Print Page Previous Next Advertisements ”;

Aug 10

Splunk – Field Searching

Splunk – Field Searching ”; Previous Next When Splunk reads the uploaded machine data, it interprets the data and divides it into many fields which represent a single logical fact about the entire data record. For example, a single record of information may contain server name, timestamp of the event, type of the event being logged whether login attempt or a http response, etc. Even in case of unstructured data, Splunk tries to divide the fields into key value pairs or separate them based on the data types they have, numeric and string, etc. Continuing with the data uploaded in the previous chapter, we can see the fields from the secure.log file by clicking on the show fields link which will open up the following screen. We can notice the fields Splunk has generated from this log file. Choosing the Fields We can choose what fields to be displayed by selecting or unselecting the fields from the list of all fields. Clicking on all fields opens a window showing the list of all the fields. Some of these fields have check marks against them showing they are already selected. We can use the check boxes to choose our fields for display. Besides the name of the field, it displays the number of distinct values the fields have, its data type and what percentage of events this field is present in. Field Summary Very detailed stats for every selected field become available by clicking on the name of the field. It shows all the distinct values for the field, their count and their percentages. Using Fields in Search The field names can also be inserted into the search box along with the specific values for the search. In the below example, we aim to find all the records for the date, 15th Oct for the host named mailsecure_log. We get the result for this specific date. Print Page Previous Next Advertisements ”;

Aug 10

Continuous Series Arithmetic Mode

Statistics – Continuous Series Arithmetic Mode ”; Previous Next When data is given based on ranges along with their frequencies. Following is an example of continous series − Items 0-5 5-10 10-20 20-30 30-40 Frequency 2 5 1 3 12 Formula $M_o = {L} + frac{f_1-f0}{2f_1-f_0-f_2} times {i}$ Where − ${M_o}$ = Mode ${L}$ = Lower limit of modal class ${f_1}$ = Frquencey of modal class ${f_0}$ = Frquencey of pre-modal class ${f_2}$ = Frquencey of class succeeding modal class ${i}$ = Class interval. In case there are two values of variable which have equal highest frequency, then the series is bi-modal and mode is said to be ill-defined. In such situations mode is calculated by the following formula − Mode = 3 Median – 2 Mean Arithmetic Mode can be used to describe qualitative phenomenon e.g. consumer preferences, brand preference etc. It is preferred as a measure of central tendency when the distribution is not normal because it is not affected by extreme values. Example Problem Statement − Calculate the Arithmetic Mode from the following data − Wages (in Rs.) No.of workers 0-5 3 5-10 7 10-15 15 15-20 30 20-25 20 25-30 10 30-35 5 Solution − Using following formula $M_o = {L} + frac{f_1-f0}{2f_1-f_0-f_2} times {i}$ ${L}$ = 15 ${f_1}$ = 30 ${f_0}$ = 15 ${f_2}$ = 20 ${i}$ = 5 Substituting the values, we get $M_o = {15} + frac{30-15}{2 times 30-15-20} times {5} \[7pt] , = {15+3} \[7pt] , = {18}$ Thus Arithmetic Mode is 18. Calculator Print Page Previous Next Advertisements ”;

Aug 10

Splunk – Apps

Splunk – Apps ”; Previous Next A Splunk app is an extension of Splunk functionality which has its own in-built UI context to serve a specific need. Splunk apps are made up of different Splunk knowledge objects (lookups, tags, eventtypes, savedsearches, etc). Apps themselves can utilize or leverage other apps or add-ons. Splunk can run any number of apps simultaneously. When you log in to Splunk, you land on an app which is typically, the Splunk Search app. So, almost everytime you are inside the Splunk interface, you are using an app. Listing Splunk Apps We can list the available apps in Splunk by using the option Apps → Manage Apps. Navigating this option brings out the following screen which lists the existing apps available in Splunk interface. Following are important values associated with the Splunk apps − Name − It is the name of the App and unique for each App. Folder name It is the name to use for the directory in $SPLUNK_HOME/etc/apps/. The name of the folder cannot contain “dot” (.) character. Version − It is the app version string. Visible Indicates whether the app should be visible in Splunk Web. Apps that contain a user interface should be visible. Sharing − It is the level of permissions (read or write) given to different Splunk users for that specific app. Status − Status: It is the current status of availability of the App. It may be enabled or disabled for use. App Permissions A proper setting of permissions for using the app is important. We can restrict the app to be used by a single user or by multiple users including all users. The below screen which appears after clicking on the permissions link in the above is used to modify the access to different roles. By default, the check marks for Read and Write option is available for Everyone. But we can change that by going to each role and selecting appropriate permission for that specific role. App Marketplace There is a wide variety of needs for which the Splunk search functionalities are used. So, there is a Splunk App market place which has come into existence show casing many different apps created by individual and organizations. They are available in both free and paid versions. We can browse those apps by choosing the option Apps → Manage Apps → Browse More Apps. The below screen comes up. As you can see, the App name along with a brief description of the functionality of the App appears. This helps you decide which app to use. Also, note how the Apps are categorized in the left bar to help choose the type of App faster. Print Page Previous Next Advertisements ”;

Aug 10

Splunk – Search Optimization

Splunk – Search Optimization ”; Previous Next Splunk already includes the optimization features, analyses and processes your searches for maximum efficiency. This efficiency is mainly achieved through the following two optimization goals − Early Filtering − These optimizations filter the results very early so that the amount of data getting processed is reduced as early as possible during the search process. This early filter avoids unnecessary lookup and evaluation calculations for events that are not part of final search results. Parallel Processing − The built-in optimizations can reorder search processing, so that as many commands as possible are run in parallel on the indexers before sending the search results to the search head for final processing. Analysing Search Optimisations Splunk has given us tools to analyse how the search optimization works. These tools help us figure out how the filter conditions are used and what is the sequence of these optimisation steps. It also gives us the cost of the various steps involved in the search operations. Example Consider a search operation to find the events which contain the words: fail, failed or password. When we put this search query in the search box, the built-in optimizers act automatically to decide the path of the search. We can verify how long the search took to return a specific number of search results and if needed can go on to check each and every step of the optimization along with the cost associated with it. We follow the path of Search → Job → Inspect Job to get these details as shown below − The next screen gives details of the optimization that has occurred for the above query. Here, we need to note the number of events and the time taken to return the result. Turning Off Optimization We can also turn off the in-built optimization and notice the difference in the time taken for the search result. The result may or may not be better than the in-built search. In case it is better, we may always choose this option of turning off the optimization for only this specific search. In the below diagram, we use the No Optimization command presented as noop in the search query. The next screen gives us the result of using no optimization. For this given query, the results come faster without using in-built optimizations. Print Page Previous Next Advertisements ”;

Aug 10

Splunk – Time Range Search

Splunk – Time Range Search ”; Previous Next The Splunk web interface displays timeline which indicates the distribution of events over a range of time. There are preset time intervals from which you can select a specific time range, or you can customize the time range as per your need. The below screen shows various preset timeline options. Choosing any of these options will fetch the data for only that specific time period which you can also analyse further, using the custom timeline options available. For example, choosing the previous month option gives us the result only for the previous month as you can see the in spread of the timeline graph below. Selecting a Time Subset By clicking and dragging across the bars in the timeline, we can select a subset of the result that already exists. This does not cause the re-execution of the query. It only filters out the records from the existing result set. Below image shows the selection of a subset from the result set − Earliest and Latest The two commands, earliest and latest can be used in the search bar to indicate the time range in between which you filter out the results. It is similar to selecting the time subset, but it is through commands rather than the option of clicking at a specific time line bar. So, it provides a finer control over that data range you can pick for your analysis. In the above image, we give a time range between last 7 days to last 15 days. So, the data in between these two days is displayed. Nearby Events We can also find nearby events of a specific time by mentioning how close we want the events to be filtered out. We have the option of choosing the scale of the interval, like – seconds, minutes, days and week etc. Print Page Previous Next Advertisements ”;

Aug 10

Splunk – Environment

Splunk – Environment ”; Previous Next In this tutorial, we will aim to install the enterprise version. This version is available for a free evaluation for 60 days with all features enabled. You can download the setup using the below link which is available for both windows and Linux platforms. https://www.splunk.com/en_us/download/splunk-enterprise.html. Linux Version The Linux version is downloaded from the download link given above. We choose the .deb package type as the installation will be done in a Ubuntu platform. We shall learn this with a step by step approach − Step 1 Download the .deb package as shown in the screenshot below − Step 2 Go to the download directory and install Splunk using the above downloaded package. Step 3 Next you can start Splunk by using the following command with accept license argument. It will ask for administrator user name and password which you should provide and remember. Step 4 The Splunk server starts and mentions the URL where the Splunk interface can be accessed. Step 5 Now, you can access the Splunk URL and enter the admin user ID and password created in step 3. Windows Version The windows version is available as a msi installer as shown in the below image − Double clicking on the msi installer installs the Windows version in a straight forward process. The two important steps where we must make the right choice for successful installation are as follows. Step 1 As we are installing it on a local system, choose the local system option as given below − Step 2 Enter the password for the administrator and remember it, as it will be used in the future configurations. Step 3 In the final step, we see that Splunk is successfully installed and it can be launched from the web browser. Step 4 Next, open the browser and enter the given url, http://localhost:8000, and login to the Splunk using the admin user ID and password. Print Page Previous Next Advertisements ”;

Aug 10

Splunk – Pivot & Datasets

Splunk – Pivot and Datasets ”; Previous Next Splunk can ingest different types of data sources and build tables which are similar to relational tables. These are called table dataset or just tables. They provide easy ways to analyse and filter the data and lookups, etc. These table data sets are also used in creating pivot analysis which we learn in this chapter. Creating a Dataset We use a Splunk Add-on named Splunk Datasets Add-on to create and manage the datasets. It can be downloaded from the Splunk website, https://splunkbase.splunk.com/app/3245/#/details. It has to be installed by following the instructions given in the details tab in this link. On successful installation, we see a button named Create New Table Dataset. Selecting a Dataset Next, we click on the Create New Table Dataset button and it gives us the option to choose from the below three options. Indexes and Source Types − Choose from an existing index or source type which are already added to Splunk through Add Data app. Existing Datasets − You might have already created some dataset previously which you want to modify by creating a new dataset from it. Search − Write a search query and the result can be used to create a new dataset. In our example, we choose an index to be our source of data set as shown in the image below − Choosing Dataset Fields On clicking OK in the above screen, we are presented with an option to choose the various fields we want to finally get into the Table Dataset. The _time field is selected by default and this field cannot be dropped. We choose the fields: bytes, categoryID, clientIP and files. On clicking done in the above screen, we get the final dataset table with all the selected fields, as seen below. Here the dataset has become similar to a relational table. We save the dataset with save as option available in the top right corner. Creating Pivot We use the above dataset to create a pivot report. The pivot report reflects aggregation of values of one column with respect to the values in another column. In other words, one columns values are made into rows and another columns values are made into rows. Choose Dataset Action To achieve this, we first select the dataset using the dataset tab and then choose the option Visualize with Pivot from the Actions column for that data set. Choose the Pivot Fields Next, we choose the appropriate fields for creating the pivot table. We choose category ID in the split columns option as this is the field whose values should appear as different columns in the report. Then we choose File in the Split Rows option as this is the field whose values should be presented in rows. The result shows count of each categoryid values for each value in the file field. Next, we can save the pivot table as a Report or a panel in an existing dashboard for future reference. Print Page Previous Next Advertisements ”;