Big Data & Analytics Archives - Page 23 of 75 - Donotsad where can learn any thing work project and make money

Aug 10

Splunk – Source Types

Splunk – Source Types ”; Previous Next All the incoming data to Splunk are first judged by its inbuilt data processing unit and classified to certain data types and categories. For example, if it is a log from apache web server, Splunk is able to recognize that and create appropriate fields out of the data read. This feature in Splunk is called source type detection and it uses its built-in source types that are known as “pretrained” source types to achieve this. This makes things easier for analysis as the user does not have to manually classify the data and assign any data types to the fields of the incoming data. Supported Source Types The supported source types in Splunk can be seen by uploading a file through the Add Data feature and then selecting the dropdown for Source Type. In the below image, we have uploaded a CSV file and then checked for all the available options. Source Type Sub-Category Even in those categories, we can further click to see all the sub categories that are supported. So when you choose the database category, you can find the different types of databases and their supported files which Splunk can recognize. Pre-Trained Source Types The below table lists some of the important pre-trained source types Splunk recognizes − Source Type Name Nature access_combined NCSA combined format http web server logs (can be generated by apache or other web servers) access_combined_wcookie NCSA combined format http web server logs (can be generated by apache or other web servers), with cookie field added at end apache_error Standard Apache web server error log linux_messages_syslog Standard linux syslog (/var/log/messages on most platforms) log4j Log4j standard output produced by any J2EE server using log4j mysqld_error Standard mysql error log Print Page Previous Next Advertisements ”;

Aug 10

QlikView – Inline Data

QlikView – Inline Data ”; Previous Next Data can be entered into a QlikView document by directly typing or pasting it. This feature is a quick method to get the data from the clipboard into the QlikView. The script editor provides this feature under the Insert tab. Script Editor To open the Inline data load option, we open the script editor and go to Insert → Load Statement → Load Inline. Inserting Data On opening the above screen, we get a spreadsheet-like document where we can type the values. We can also paste the values already available in the clipboard. Please note the column headers are created automatically. Click Finish. Load Script The command, which loads the data, is created in the background which can be seen in the script editor. Table Box Data On creating a Table Box Sheet Object, we see the data that is read from the Inline data load option. Print Page Previous Next Advertisements ”;

Aug 10

Splunk – Home

Splunk Tutorial PDF Version Quick Guide Resources Job Search Discussion Splunk is a software used to search and analyze machine data. This machine data can come from web applications, sensors, devices or any data created by user. It serves the needs of IT infrastructure by analyzing the logs generated in various processes but it can also analyze any structured or semi-structured data with proper data modelling. It has built-in features to recognize the data types, field separators and optimize the search processes. It also provides data visualization on the search results. Audience This tutorial targets IT professionals, students, and IT infrastructure management professionals who want a solid grasp of essential Splunk concepts. After completing this tutorial, you will achieve intermediate expertise in Splunk, and easily build on your knowledge to solve more challenging problems. Prerequisites The reader should be familiar with querying language like SQL. General knowledge in typical operations in using computer applications like storing and retrieving data and reading the logs generated by computer programs will be an highly useful. Print Page Previous Next Advertisements ”;

Aug 10

Splunk – Interfaces

Splunk – Interface ”; Previous Next The Splunk web interface consists of all the tools you need to search, report and analyse the data that is ingested. The same web interface provides features for administering the users and their roles. It also provides links for data ingestion and the in-built apps available in Splunk. The below picture shows the initial screen after your login to Splunk with the admin credentials. Administrator Link The Administrator drop down gives the option to set and edit the details of the administrator. We can reset the admin email ID and password using the below screen − Further from the administrator link, we can also navigate to the preferences option where we can set the time zone and home application on which the landing page will open after your login. Currently, it opened on the Home page as shown below − Settings Link This is a link which shows all the core features available in Splunk. For example, you can add the lookup files and lookup definitions by choosing the lookup link. We will discuss the important settings of these links in the subsequent chapters. Search and Reporting Link The search and reporting link takes us to the features where we can find the data sets that are available for searching the reports and alerts created for these searches. It is clearly shown in the below screenshot − Print Page Previous Next Advertisements ”;

Aug 10

Qlikview – Useful Resources

QlikView – Useful Resources ”; Previous Next The following resources contain additional information on QlikView. Please use them to get more in-depth knowledge on this topic. QlikView Scripting Master Projects : Beginner to Advanced 70 Lectures 5 hours Arthur Fong More Detail QlikView Course: Beginner To Advanced 75 Lectures 3.5 hours Prathmesh Bendkhale More Detail Data Analytics MasterClass(PostgreSQL, BI, Python) 134 Lectures 8.5 hours Arthur Fong More Detail Print Page Previous Next Advertisements ”;

Aug 10

QlikView – Mapping Tables

QlikView – Mapping Tables ”; Previous Next Mapping table is a table, which is created to map the column values between two tables. It is also called a Lookup table, which is only used to look for a related value from some other table. Input Data Let us consider the following input data file, which represents the sales values in different regions. ProductID,ProductCategory,Region,SaleAmount 1,Outdoor Recreation,Europe,4579 2,Clothing,Europe,4125 3,Costumes & Accessories,South Asia,6521 4,Athletics,South Asia,4125 5,Personal Care,Australia,5124 6,Arts & Entertainment,North AMerica,1245 7,Hardware,South America,456 8,Home & Garden,South America,241 9,Food,South Asia,1247 10,Home & Garden,South Asia,5462 11,Office Supplies,Australia,577 The following data represents the countries and their regions. Region,Country Europe,Germany Europe,Italy South Asia,Singapore South Asia,Korea North AMerica,USA South America,Brazil South America,Peru South Asia,China South Asia,Sri Lanka Load Script The above data is loaded to QlikView memory by using the script editor. Open the Script editor from the File menu or press Control+E. Choose the Table Files option from the Data from Files tab and browse for the file containing the above data. Click OK and ess Control+R to load the data into the QlikView”s memory. Create Table Box Let us create two table boxes for each of the above table as shown below. Here we cannot get the value of country in the Sales region report. Create the Mapping Table The following script produces the mapping table, which maps the region value from the sales table with the country value from the MapCountryRegion table. Table Chart On completing the above steps and creating a Table box to view the data, we get the country columns along with other columns from Sales table. Print Page Previous Next Advertisements ”;

Aug 10

Power BI – Visualization Options

Power BI – Visualization Options ”; Previous Next In this chapter, you will learn about the various visualization options in Power BI. Creating Simple Visualizations Visualizations are used to effectively present your data and are the basic building blocks of any Business Intelligence tool. Power BI contains various default data visualization components that include simple bar charts to pie charts to maps, and also complex models such as waterfalls, funnels, gauges, and many other components. In Power BI, you can create visualization in two ways. First is by adding from the right side pane to Report Canvas. By default, it is the table type visualization, which is selected in Power BI. Another way is to drag the fields from right side bar to the axis and value axis under Visualization. You can add multiple fields to each axis as per the requirement. In Power BI, it is also possible to move your visualization on the reporting canvas by clicking and then dragging it. You can also switch between different type of charts and visualizations from the Visualization pane. Power BI attempts to convert your selected fields to the new visual type as closely as possible. Creating Map Visualizations In Power BI, we have two types of map visualization – bubble maps and shape maps. If you want to create a bubble map, select the map option from the visualization pane. To use a bubble map, drag the map from Visualizations to the Report Canvas. To display values, you have to add any location object to the axis. In the value fields, you can see that it accepts values axis such as City and State and or you can also add longitude and latitude values. To change the bubble size, you need to add a field to the value axis. You can also use a filled map in data visualization, just by dragging the filled map to the Report Canvas. Note − If you see a warning symbol on top of your map visualization, it means that you need to add more locations to your map chart. Using Combination Charts In data visualization, it is also required to plot multiple measures in a single chart. Power BI supports various combination chart types to plot measure values. Let us say you want to plot revenue and unit_solds in one chart. Combination charts are the most suitable option for these kind of requirement. One of the most common Combination chart in Power BI is Line and Stacked column charts. Let us say we have a revenue field and we have added a new data source that contains customer-wise unit quantity and we want to plot this in our visualization. Once you add a data source, it will be added to the list of fields on the right side. You can add units to the column axis as shown in the following screenshot. You have other type of combine chart that you can use in Power BI – Line and Clustered Column. Using Tables In Power BI, when you add a dataset to your visualization, it adds a table chart to the Report canvas. You can drag the fields that you want to add to the report. You can also select the checkbox in front of each field to add those to the Report area. With the numerical values in a table, you can see a sum of values at the bottom. You can also perform a sort in the table using an arrow key at the top of the column. To perform ascending/descending sort, just click the arrow mark, and the values in the column will be sorted. The order of the columns in a table is determined by the order in the value bucket on the right side. If you want to change the order, you can delete any column and add the other one. You can also undo summarize or apply different aggregate function on numerical values in the table. To change the aggregation type, click the arrow in the value bucket in front of the measure and you will see a list of formulas that can be used. Another table type in Power BI is the matrix table that provides a lot of features such as auto sizing, column tables, and setting colors, etc. Modify Colors in Charts In Power BI, you can also modify the colors in the chart. When you select any visualization, it has an option to change the color. Following options are available under the Format tab − Legend Data Colors Detail Label Title Background Lock Aspect Border General To open these options, go to the Format tab as shown in the following screenshot. Once you click, you can see all the options available. When you expand the Legend field, you have an option where you want to display the legend. You can select − Position Title Legend Name Color Text Size Font Family Similarly, you have data colors. In case, you want to change the color of any data field, you can use this option. It shows all objects and their corresponding colors in the chart. You also have Analytics feature in the tool, where you can draw lines as per requirement in data visualization. You have the following line types in data visualization − Constant Line Min Line Max Line Average Line Median Line Percentile Line You can opt for a dashed, dotted, or a solid line. You can select Transparency level, color, and position of the line. You can also switch on/off data label for this line. Adding Shapes, Images and Text box Sometimes it is required that you need to add static text, images, or shapes to your visualization. In case you want to add header/footer or any static signatures, messages to data visualization this option can be used. You can also add URLs in the text box and Power BI uses those link to make it live. To add shapes, images and text box, navigate to the

Aug 10

Qlikview – Questions/Answers

QlikView – Interview Questions ”; Previous Next Dear readers, these QlikView Interview Questions have been designed specially to get you acquainted with the nature of questions you may encounter during your interview for the subject of R programming. As per my experience good interviewers hardly plan to ask any particular question during your interview, normally questions start with some basic concept of the subject and later they continue based on further discussion and what you answer − What are the unique features of QlikView? (a)Data Association is maintained automatically. (b) The structure, data and calculations of a report are all held in the memory (RAM) of the server. (c) Data is compressed to 10% of its original size. (d) Visual relationship using colors. What is Incremental Load? The concept of loading only the new or changed records from the source into the QlikView document is called Incremental Load. How you connect QlikView to Database? QlikView can connect to data base using ODBC connection created for the database. What is Dashboard? A dashboard is a QlikView document which shows many matrices together and the values in the sheet objects can change dynamically upon selection of certain value in one of the Sheet Objects. Why do we need a Master Calendar? The Master calendar is required when we want to create some additional date values which are not already captured in the data that is being analyzed. For example finding the quarter to which a given date falls etc. What is Aggr Function? AGGR statement function produces a virtual table, with one expression and grouped by one or more dimensions. The result of this virtual table can then be used by a further outer aggregation function(s). What is a star schema? A Star schema is a data model in which one fact table is connected to multiple dimension tables though foreign keys What is the difference between Join and Keep? In case of keep both the datasets are available in QlikView”s memory while in join the load statements produce only one data set from which you have to choose the columns. Also there is no concept of outer keep where as we have outer join available in case of joins. What is Synthetic key, is it good or bad to have? QlikView creates a synthetic key when two or more column between tables are same. It does not impact the data or performance but it indicates a flaw in the data model design. Difference between Join and Concatenate? Join gives the resulting records from two tables as records containing columns form both tables. But Concatenate only appends the rows from one table with another. What are Circular loops in QlikView? A circular loop is created when the relationship between two tables can be established both directly and through another third table. What does MonthStart function do? Returns a value corresponding to a timestamp with the first millisecond of the first date of the month containing date. What does Autogenerate Function do? This function auto generates values between a given range of numbers. What is a pivot table in QlikView? Pivot Tables are used to present sum of values across many dimensions available in the data. For example showing the total sales figure for both the months and quarters in a years. What are the 6 chart types available in QlikView? Bar Chart, Pie Chart, Line Chart, Gauge Chart, Pivot table, Straight table. Can QlikView extract data from website? How? Yes. In QlikView script editor we have the option to extract data form a web file by giving the URL as the input. What is the use of Promote/Demote options in a Table Box property? It allows you to rearrange the columns in the Table box which is displayed in the Sheet. What are the three options available under the Rotate table functionality for data transformation? Rotate Left, Rotate Right and Transpose. What are the parameters required by a Crosstable Wizard to create a cross table? There are three fields required to create a cross table. Qualifier field, Attribute Field and Data Field. What does Partial Reload do? It Executes the current load script, including all script commands, such as Drop Table and reloads data to the active QlikView document. However, only those tables whose load and select statements are preceded by the Replace or Add prefix are reloaded. How can we see the table structures of data loaded to QlikView’s memory? By using the Table Viewer Option under file menu we can see the Tables and their relationships. What is the use of – Export Sheet layout? When we want to preserve the layout of a sheet to be used again, we export the sheet layout which creates a XML file without any data. What is Webview Mode? The WebView mode uses the internal web browser in QlikView to display the document layout as an AJAX page. What is a selection indicator in QlikView Document? A selection indicator is used to indicate the type of association between the data present in different sheet objects. A green dot indicates selected values, blue dot indicates locked values and red dot indicates de-selected values in AND mode. What does *bi* in text search mean? It searches for any string that contains bi. What is Fuzzy search in QlikView? Fuzzy search finds all the values according to their degree of resemblance to the search string. Which means, even if the spelling does not match character by character, those results will also be shown. What is a Bookmark in QlikView? A bookmark in QlikView captures the selections in all states defined in a QlikView document. It can be saved and accessed later. What is a user bookmark and a shared server bookmark? The User bookmark is saved in the user computer while the shared server bookmark is saved in the server and accessible to all the allowed users. What are the different ways in which the QlikView Alerts can be triggered? The alerts

Aug 10

Power BI – Supported Data Sources

Power BI – Supported Data Sources ”; Previous Next Power BI supports large range of data sources. You can click Get data and it shows you all the available data connections. It allows you to connect to different flat files, SQL database, and Azure cloud or even web platforms such as Facebook, Google Analytics, and Salesforce objects. It also includes ODBC connection to connect to other ODBC data sources, which are not listed. Following are the available data sources in Power BI − Flat Files SQL Database OData Feed Blank Query Azure Cloud platform Online Services Blank Query Other data sources such as Hadoop, Exchange, or Active Directory To get data in Power BI desktop, you need to click the Get data option in the main screen. It shows you the most common data sources first. Then, click the More option to see a full list of available data sources. When you click “More..” tab as shown in the above screenshot, you can see a new navigation window, where on the left side it shows a category of all available data sources. You also have an option to perform a search at the top. Following are the various data sources listed − All Under this category, you can see all the available data sources under Power BI desktop. File When you click File, it shows you all flat file types supported in Power BI desktop. To connect to any file type, select the file type from the list and click Connect. You have to provide the location of the file. Database When you click the Database option, it shows a list of all the database connections that you can connect to. To connect to any database, select a Database type from the list as shown in the above screenshot. Click Connect. You have to pass Server name/ User name and password to connect. You can also connect via a direct SQL query using Advance options. You can also select Connectivity mode- Import or DirectQuery. Note − You can’t combine import and DirectQuery mode in a single report. Import vs DirectQuery DirectQuery option limits the option of data manipulation and the data stays in SQL database. DirectQuery is live and there is no need to schedule refresh as in the Import method. Import method allows to perform data transformation and manipulation. When you publish the data to PBI service, limit is 1GB. It consumes and pushes data into Power BI Azure backend and data can be refreshed up to 8 times a day and a schedule can be set up for data refresh. Advantages of Using DirectQuery Using DirectQuery, you can build data visualizations on large datasets, which is not feasible to import in Power BI desktop. DirectQuery doesn’t apply any 1GB data set limit. With the use of DirectQuery, the report always shows current data. Limitations of Using DirectQuery There is a limitation of 1 million row for returning data while using DirectQuery. You can perform aggregation of more number of rows, however, the result rows should be less than 1 million to return the dataset. In DirectQuery, all tables should come from a single database. When a complex query is used in the Query editor, it throws an error. To run a query, you need to remove the error from the query. In DirectQuery, you can use Relationship filtering only in one direction. It doesn’t support special treatment for time-related data in tables. Azure Using the Azure option, you can connect to the database in Azure cloud. Following screenshot shows the various options available under Azure category. Online Services Power BI also allows you to connect to different online services such as Exchange, Salesforce, Google Analytics, and Facebook. Following screenshots shown the various options available under Online Services. Other Following screenshot shows the various options available under other category. Print Page Previous Next Advertisements ”;

Aug 10

QlikView – Star Schema

QlikView – Star Schema ”; Previous Next A start schema model is a type of data model in which multiple dimensions are linked to a single fact table. Of course, in bigger models there can be multiple facts tables linked to multiple dimensions and other fact tables. The usefulness of this model lies in performing fast queries with minimal joins among various tables. The fact table contains data, which are measures and have numeric values. Calculations are applied on the fields in the fact table. The unique keys of the dimension tables are used in linking it to the fat table, which also has a key usually with the same field name. Therefore, the Fact table contains the keys from the entire dimension table and forms a concatenated primary key used in various queries. Input Data Given below is a list of tables, which contain the data for different products from various suppliers and regions. Also the supply happens at different time intervals, which are captured in the Time dimension table. Product Dimension It contains the Product Category and Product Names. The Product ID field is the unique Key. ProductID,ProductCategory,ProductName 1,Outdoor Recreation,Winter Sports & Activities 2,Clothing,Uniforms 3,Lawn & Garden Power, Equipment 4,Athletics,Rugby 5,Personal Care,Shaver 6,Arts & Entertainment,Crafting Materials 7,Hardware,Power Tool Batteries Region Dimension It contains the Region Names where the suppliers are based. The RegionID field is the unique Key. RegionID,Continent,Country 3,North America, USA 7,South America, Brazil 12,Asia,China 2,Asia,Japan 5,Europe,Belgium Supplier Dimension It contains the Supplier Names, which supply the above products. The SupplierID field is the unique Key. SupplierID,SupplierName 3S12,Supre Suppliers 4A15,ABC Suppliers 4S66,Max Sports 5F244,Nice Foods 8A45,Artistic angle Time Dimension It contains the Time periods when the supply of the above products occur. The TimeID field is the unique Key. TimeID,Year,Month 1,2012,Feb 2,2012,May 3,2012,Sep 4,2013,Aug 5,2014,Jan 6,2014,Nov Supplier Quantity Fact It contains the values for the quantities supplied and percentage of defects in them. It joins to each of the above dimensions through keys with same name. ProductID,RegionID,TimeID,SupplierID,Quantity, DefectPercentage 1,3,3,5F244,8452,12 2,3,1,4S66,5124,8.25 3,7,1,8A45,5841,7.66 4,12,2,4A15,5123,1.25 5,5,3,4S66,7452,8.11 6,2,5,4A15,5142,3.66 7,2,1,4S66,452,2.06 Load Script The above data is loaded to QlikView memory by using the script editor. Open the Script editor from the File menu or press Control+E. Choose the Table Files option from the Data from Files tab and browse for the file containing the above data. Click OK and press Control+R to load the data into QlikView”s memory. Below is the script which appears after each of the above file is read. LOAD ProductID, ProductCategory, ProductName FROM [C:QlikviewimagesStarSchemaProduct_dimension.csv] (txt, codepage is 1252, embedded labels, delimiter is ”,”, msq); LOAD TimeID, Year, Month FROM [C:QlikviewimagesStarSchemaTime.csv] (txt, codepage is 1252, embedded labels, delimiter is ”,”, msq); LOAD SupplierID, SupplierName FROM [C:QlikviewimagesStarSchemaSuppliers.csv] (txt, codepage is 1252, embedded labels, delimiter is ”,”, msq); LOAD RegionID, Continent, Country FROM [C:QlikviewimagesStarSchemaRegions.csv] (txt, codepage is 1252, embedded labels, delimiter is ”,”, msq); LOAD ProductID, RegionID, TimeID, SupplierID, Quantity, DefectPercentage FROM [C:QlikviewimagesStarSchemaSupplier_quantity.csv] (txt, codepage is 1252, embedded labels, delimiter is ”,”, msq); Star Schema Data Model After reading the above data into QlikView memory, we can look at the data model, which shows all the tables, fields, and relationship in form of a star schema. Print Page Previous Next Advertisements ”;