Splunk – Environment

Splunk – Environment ”; Previous Next In this tutorial, we will aim to install the enterprise version. This version is available for a free evaluation for 60 days with all features enabled. You can download the setup using the below link which is available for both windows and Linux platforms. https://www.splunk.com/en_us/download/splunk-enterprise.html. Linux Version The Linux version is downloaded from the download link given above. We choose the .deb package type as the installation will be done in a Ubuntu platform. We shall learn this with a step by step approach − Step 1 Download the .deb package as shown in the screenshot below − Step 2 Go to the download directory and install Splunk using the above downloaded package. Step 3 Next you can start Splunk by using the following command with accept license argument. It will ask for administrator user name and password which you should provide and remember. Step 4 The Splunk server starts and mentions the URL where the Splunk interface can be accessed. Step 5 Now, you can access the Splunk URL and enter the admin user ID and password created in step 3. Windows Version The windows version is available as a msi installer as shown in the below image − Double clicking on the msi installer installs the Windows version in a straight forward process. The two important steps where we must make the right choice for successful installation are as follows. Step 1 As we are installing it on a local system, choose the local system option as given below − Step 2 Enter the password for the administrator and remember it, as it will be used in the future configurations. Step 3 In the final step, we see that Splunk is successfully installed and it can be launched from the web browser. Step 4 Next, open the browser and enter the given url, http://localhost:8000, and login to the Splunk using the admin user ID and password. Print Page Previous Next Advertisements ”;

Splunk – Pivot & Datasets

Splunk – Pivot and Datasets ”; Previous Next Splunk can ingest different types of data sources and build tables which are similar to relational tables. These are called table dataset or just tables. They provide easy ways to analyse and filter the data and lookups, etc. These table data sets are also used in creating pivot analysis which we learn in this chapter. Creating a Dataset We use a Splunk Add-on named Splunk Datasets Add-on to create and manage the datasets. It can be downloaded from the Splunk website, https://splunkbase.splunk.com/app/3245/#/details. It has to be installed by following the instructions given in the details tab in this link. On successful installation, we see a button named Create New Table Dataset. Selecting a Dataset Next, we click on the Create New Table Dataset button and it gives us the option to choose from the below three options. Indexes and Source Types − Choose from an existing index or source type which are already added to Splunk through Add Data app. Existing Datasets − You might have already created some dataset previously which you want to modify by creating a new dataset from it. Search − Write a search query and the result can be used to create a new dataset. In our example, we choose an index to be our source of data set as shown in the image below − Choosing Dataset Fields On clicking OK in the above screen, we are presented with an option to choose the various fields we want to finally get into the Table Dataset. The _time field is selected by default and this field cannot be dropped. We choose the fields: bytes, categoryID, clientIP and files. On clicking done in the above screen, we get the final dataset table with all the selected fields, as seen below. Here the dataset has become similar to a relational table. We save the dataset with save as option available in the top right corner. Creating Pivot We use the above dataset to create a pivot report. The pivot report reflects aggregation of values of one column with respect to the values in another column. In other words, one columns values are made into rows and another columns values are made into rows. Choose Dataset Action To achieve this, we first select the dataset using the dataset tab and then choose the option Visualize with Pivot from the Actions column for that data set. Choose the Pivot Fields Next, we choose the appropriate fields for creating the pivot table. We choose category ID in the split columns option as this is the field whose values should appear as different columns in the report. Then we choose File in the Split Rows option as this is the field whose values should be presented in rows. The result shows count of each categoryid values for each value in the file field. Next, we can save the pivot table as a Report or a panel in an existing dashboard for future reference. Print Page Previous Next Advertisements ”;

Splunk – Source Types

Splunk – Source Types ”; Previous Next All the incoming data to Splunk are first judged by its inbuilt data processing unit and classified to certain data types and categories. For example, if it is a log from apache web server, Splunk is able to recognize that and create appropriate fields out of the data read. This feature in Splunk is called source type detection and it uses its built-in source types that are known as “pretrained” source types to achieve this. This makes things easier for analysis as the user does not have to manually classify the data and assign any data types to the fields of the incoming data. Supported Source Types The supported source types in Splunk can be seen by uploading a file through the Add Data feature and then selecting the dropdown for Source Type. In the below image, we have uploaded a CSV file and then checked for all the available options. Source Type Sub-Category Even in those categories, we can further click to see all the sub categories that are supported. So when you choose the database category, you can find the different types of databases and their supported files which Splunk can recognize. Pre-Trained Source Types The below table lists some of the important pre-trained source types Splunk recognizes − Source Type Name Nature access_combined NCSA combined format http web server logs (can be generated by apache or other web servers) access_combined_wcookie NCSA combined format http web server logs (can be generated by apache or other web servers), with cookie field added at end apache_error Standard Apache web server error log linux_messages_syslog Standard linux syslog (/var/log/messages on most platforms) log4j Log4j standard output produced by any J2EE server using log4j mysqld_error Standard mysql error log Print Page Previous Next Advertisements ”;

Splunk – Search Macros

Splunk – Search Macros ”; Previous Next Search macros are reusable blocks of Search Processing Language (SPL) that you can insert into other searches. They are used when you want to use the same search logic on different parts or values in the data set dynamically. They can take arguments dynamically and the search result will be updated as per the new values. Macro Creation To create the search macro, we go to the settings → Advanced Search → Search macros → Add new. This brings up the below screen where we start creating the macro. Macro Scenario We want to show various stats about the file size from the web_applications log. The stats are about max, min and avg value of the filesize using the bytes field in the log. The result should display these stats for each file listed in the log. So here the type of the stats is dynamic in nature. The name of the stats function will be passed as an argument to the macro. Defining the Macro Next, we define the macro by setting various properties as shown in the below screen. The name of the macro contains (1), indicating that there is one argument to be passed into the macro when it is used in the search string. fun is the argument which will be passed on to the macro during execution in the search query. Using the Macro To use the macro, we make it a part of the search string. On passing different values for the argument we see different results as expected. Consider finding the average size in bytes of the files. We pass avg as the argument and get the result as shown below. The macro has been kept under ` sign as part of the search query. Similarly, if we want the maximum file size for each of the files present in the log, then we use max as the argument. The result is as shown below. Print Page Previous Next Advertisements ”;

Splunk – Data Ingestion

Splunk – Data Ingestion ”; Previous Next Data ingestion in Splunk happens through the Add Data feature which is part of the search and reporting app. After logging in, the Splunk interface home screen shows the Add Data icon as shown below. On clicking this button, we are presented with the screen to select the source and format of the data we plan to push to Splunk for analysis. Gathering The Data We can get the data for analysis from the Official Website of Splunk. Save this file and unzip it in your local drive. On opening the folder, you can find three files which have different formats. They are the log data generated by some web apps. We can also gather another set of data provided by Splunk which is available at from the Official Splunk webpage. We will use data from both these sets for understanding the working of various features of Splunk. Uploading data Next, we choose the file, secure.log from the folder, mailsv which we have kept in our local system as mentioned in the previous paragraph. After selecting the file, we move to next step using the green coloured next button in the top right corner. Selecting Source Type Splunk has an in-built feature to detect the type of the data being ingested. It also gives the user an option to choose a different data type than the chosen by Splunk. On clicking the source type drop down, we can see various data types that Splunk can ingest and enable for searching. In the current example given below, we choose the default source type. Input Settings In this step of data ingestion, we configure the host name from which the data is being ingested. Following are the options to choose from, for the host name − Constant value It is the complete host name where the source data resides. regex on path When you want to extract the host name with a regular expression. Then enter the regex for the host you want to extract in the Regular expression field. segment in path When you want to extract the host name from a segment in your data source”s path, enter the segment number in the Segment number field. For example, if the path to the source is /var/log/ and you want the third segment (the host server name) to be the host value, enter “3”. Next, we choose the index type to be created on the input data for searching. We choose the default index strategy. The summary index only creates summary of the data through aggregation and creates index on it while the history index is for storing the search history. It is clearly depicted in the image below − Review Settings After clicking on the next button, we see a summary of the settings we have chosen. We review it and choose Next to finish the uploading of data. On finishing the load, the below screen appears which shows the successful data ingestion and further possible actions we can take on the data. Print Page Previous Next Advertisements ”;

Splunk – Field Searching

Splunk – Field Searching ”; Previous Next When Splunk reads the uploaded machine data, it interprets the data and divides it into many fields which represent a single logical fact about the entire data record. For example, a single record of information may contain server name, timestamp of the event, type of the event being logged whether login attempt or a http response, etc. Even in case of unstructured data, Splunk tries to divide the fields into key value pairs or separate them based on the data types they have, numeric and string, etc. Continuing with the data uploaded in the previous chapter, we can see the fields from the secure.log file by clicking on the show fields link which will open up the following screen. We can notice the fields Splunk has generated from this log file. Choosing the Fields We can choose what fields to be displayed by selecting or unselecting the fields from the list of all fields. Clicking on all fields opens a window showing the list of all the fields. Some of these fields have check marks against them showing they are already selected. We can use the check boxes to choose our fields for display. Besides the name of the field, it displays the number of distinct values the fields have, its data type and what percentage of events this field is present in. Field Summary Very detailed stats for every selected field become available by clicking on the name of the field. It shows all the distinct values for the field, their count and their percentages. Using Fields in Search The field names can also be inserted into the search box along with the specific values for the search. In the below example, we aim to find all the records for the date, 15th Oct for the host named mailsecure_log. We get the result for this specific date. Print Page Previous Next Advertisements ”;

Splunk – Overview

Splunk – Overview ”; Previous Next Splunk is a software which processes and brings out insight from machine data and other forms of big data. This machine data is generated by CPU running a webserver, IOT devices, logs from mobile apps, etc. It is not necessary to provide this data to the end users and does not have any business meaning. However, they are extremely important to understand, monitor and optimize the performance of the machines. Splunk can read this unstructured, semi-structured or rarely structured data. After reading the data, it allows to search, tag, create reports and dashboards on these data. With the advent of big data, Splunk is now able to ingest big data from various sources, which may or may not be machine data and run analytics on big data. So, from a simple tool for log analysis, Splunk has come a long way to become a general analytical tool for unstructured machine data and various forms of big data. Product Categories Splunk is available in three different product categories as follows − Splunk Enterprise − It is used by companies which have large IT infrastructure and IT driven business. It helps in gathering and analysing the data from websites, applications, devices and sensors, etc. Splunk Cloud − It is the cloud hosted platform with same features as the enterprise version. It can be availed from Splunk itself or through the AWS cloud platform. Splunk Light − It allows search, report and alert on all the log data in real time from one place. It has limited functionalities and features as compared to the other two versions. Splunk Features In this section, we shall discuss the important features of enterprise edition − Data Ingestion Splunk can ingest a variety of data formats like JSON, XML and unstructured machine data like web and application logs. The unstructured data can be modeled into a data structure as needed by the user. Data Indexing The ingested data is indexed by Splunk for faster searching and querying on different conditions. Data Searching Searching in Splunk involves using the indexed data for the purpose of creating metrics, predicting future trends and identifying patterns in the data. Using Alerts Splunk alerts can be used to trigger emails or RSS feeds when some specific criteria are found in the data being analyzed. Dashboards Splunk Dashboards can show the search results in the form of charts, reports and pivots, etc. Data Model The indexed data can be modelled into one or more data sets that is based on specialized domain knowledge. This leads to easier navigation by the end users who analyze the business cases without learning the technicalities of the search processing language used by Splunk. Print Page Previous Next Advertisements ”;

Splunk – Home

Splunk Tutorial PDF Version Quick Guide Resources Job Search Discussion Splunk is a software used to search and analyze machine data. This machine data can come from web applications, sensors, devices or any data created by user. It serves the needs of IT infrastructure by analyzing the logs generated in various processes but it can also analyze any structured or semi-structured data with proper data modelling. It has built-in features to recognize the data types, field separators and optimize the search processes. It also provides data visualization on the search results. Audience This tutorial targets IT professionals, students, and IT infrastructure management professionals who want a solid grasp of essential Splunk concepts. After completing this tutorial, you will achieve intermediate expertise in Splunk, and easily build on your knowledge to solve more challenging problems. Prerequisites The reader should be familiar with querying language like SQL. General knowledge in typical operations in using computer applications like storing and retrieving data and reading the logs generated by computer programs will be an highly useful. Print Page Previous Next Advertisements ”;

Splunk – Interfaces

Splunk – Interface ”; Previous Next The Splunk web interface consists of all the tools you need to search, report and analyse the data that is ingested. The same web interface provides features for administering the users and their roles. It also provides links for data ingestion and the in-built apps available in Splunk. The below picture shows the initial screen after your login to Splunk with the admin credentials. Administrator Link The Administrator drop down gives the option to set and edit the details of the administrator. We can reset the admin email ID and password using the below screen − Further from the administrator link, we can also navigate to the preferences option where we can set the time zone and home application on which the landing page will open after your login. Currently, it opened on the Home page as shown below − Settings Link This is a link which shows all the core features available in Splunk. For example, you can add the lookup files and lookup definitions by choosing the lookup link. We will discuss the important settings of these links in the subsequent chapters. Search and Reporting Link The search and reporting link takes us to the features where we can find the data sets that are available for searching the reports and alerts created for these searches. It is clearly shown in the below screenshot − Print Page Previous Next Advertisements ”;

Splunk – Calculated Fields

Splunk – Calculated Fields ”; Previous Next Many times, we will need to make some calculations on the fields that are already available in the Splunk events. We also want to store the result of these calculations as a new field to be referred later by various searches. This is made possible by using the concept of calculated fields in Splunk search. A simplest example is to show the first three characters of a week day instead of the complete day name. We need to apply certain Splunk function to achieve this manipulation of the field and store the new result under a new field name. Example The Web_application log file has two fields named bytes and date_wday. The value in the bytes field is the number of bytes. We want to display this value as GB. This will require the field to be divided by 1024 to get the GB value. We need to apply this calculation to the bytes field. Similarly, the date_wday displays complete name of the week day. But we need to display only the first three characters. The existing values in these two fields is shown in the image below − Using the eval Function To create calculated field, we use the eval function. This function stores the result of the calculation in a new field. We are going to apply the below two calculations − # divide the bytes with 1024 and store it as a field named byte_in_GB Eval byte_in_GB = (bytes/1024) # Extract the first 3 characters of the name of the day. Eval short_day = substr(date_wday,1,3) Adding New Fields We add new fields created above to the list of fields we display as part of the search result. To do this, we choose All fields options and tick check mark against the name of these new fields as shown in below image − Displaying the calculated Fields After choosing the fields above, we are able to see the calculated fields in the search result as shown below. The search query displays the calculated fields as shown below − Print Page Previous Next Advertisements ”;