Big Data & Analytics Archives - Page 24 of 75 - Donotsad where can learn any thing work project and make money

Aug 10

QlikView – Star Schema

QlikView – Star Schema ”; Previous Next A start schema model is a type of data model in which multiple dimensions are linked to a single fact table. Of course, in bigger models there can be multiple facts tables linked to multiple dimensions and other fact tables. The usefulness of this model lies in performing fast queries with minimal joins among various tables. The fact table contains data, which are measures and have numeric values. Calculations are applied on the fields in the fact table. The unique keys of the dimension tables are used in linking it to the fat table, which also has a key usually with the same field name. Therefore, the Fact table contains the keys from the entire dimension table and forms a concatenated primary key used in various queries. Input Data Given below is a list of tables, which contain the data for different products from various suppliers and regions. Also the supply happens at different time intervals, which are captured in the Time dimension table. Product Dimension It contains the Product Category and Product Names. The Product ID field is the unique Key. ProductID,ProductCategory,ProductName 1,Outdoor Recreation,Winter Sports & Activities 2,Clothing,Uniforms 3,Lawn & Garden Power, Equipment 4,Athletics,Rugby 5,Personal Care,Shaver 6,Arts & Entertainment,Crafting Materials 7,Hardware,Power Tool Batteries Region Dimension It contains the Region Names where the suppliers are based. The RegionID field is the unique Key. RegionID,Continent,Country 3,North America, USA 7,South America, Brazil 12,Asia,China 2,Asia,Japan 5,Europe,Belgium Supplier Dimension It contains the Supplier Names, which supply the above products. The SupplierID field is the unique Key. SupplierID,SupplierName 3S12,Supre Suppliers 4A15,ABC Suppliers 4S66,Max Sports 5F244,Nice Foods 8A45,Artistic angle Time Dimension It contains the Time periods when the supply of the above products occur. The TimeID field is the unique Key. TimeID,Year,Month 1,2012,Feb 2,2012,May 3,2012,Sep 4,2013,Aug 5,2014,Jan 6,2014,Nov Supplier Quantity Fact It contains the values for the quantities supplied and percentage of defects in them. It joins to each of the above dimensions through keys with same name. ProductID,RegionID,TimeID,SupplierID,Quantity, DefectPercentage 1,3,3,5F244,8452,12 2,3,1,4S66,5124,8.25 3,7,1,8A45,5841,7.66 4,12,2,4A15,5123,1.25 5,5,3,4S66,7452,8.11 6,2,5,4A15,5142,3.66 7,2,1,4S66,452,2.06 Load Script The above data is loaded to QlikView memory by using the script editor. Open the Script editor from the File menu or press Control+E. Choose the Table Files option from the Data from Files tab and browse for the file containing the above data. Click OK and press Control+R to load the data into QlikView”s memory. Below is the script which appears after each of the above file is read. LOAD ProductID, ProductCategory, ProductName FROM [C:QlikviewimagesStarSchemaProduct_dimension.csv] (txt, codepage is 1252, embedded labels, delimiter is ”,”, msq); LOAD TimeID, Year, Month FROM [C:QlikviewimagesStarSchemaTime.csv] (txt, codepage is 1252, embedded labels, delimiter is ”,”, msq); LOAD SupplierID, SupplierName FROM [C:QlikviewimagesStarSchemaSuppliers.csv] (txt, codepage is 1252, embedded labels, delimiter is ”,”, msq); LOAD RegionID, Continent, Country FROM [C:QlikviewimagesStarSchemaRegions.csv] (txt, codepage is 1252, embedded labels, delimiter is ”,”, msq); LOAD ProductID, RegionID, TimeID, SupplierID, Quantity, DefectPercentage FROM [C:QlikviewimagesStarSchemaSupplier_quantity.csv] (txt, codepage is 1252, embedded labels, delimiter is ”,”, msq); Star Schema Data Model After reading the above data into QlikView memory, we can look at the data model, which shows all the tables, fields, and relationship in form of a star schema. Print Page Previous Next Advertisements ”;

Aug 10

QlikView – Master Calendar

QlikView – Master Calendar ”; Previous Next In QlikView, many times we need to create a calendar reference object, which can be linked to any data set present in QlikView”s memory. For example, you have a table that captures the sales amount and sales date but does not store the weekday or quarter, which corresponds to that date. In such a scenario, we create a Master Calendar which will supply the additional date fields like Quarter, Day etc. as required by any data set. Input Data Let us consider the following CSV data files, which are used as input for further illustrations. SalesDate,SalesVolume 3/28/2012,3152 3/30/2012,2458 3/31/2012,4105 4/8/2012,6245 4/10/2012,5816 4/11/2012,3522 Load Script We load the above input data using the script editor, which is invoked by pressing Control+E. Choose the option Table Files and browse for the Input file. Next, we load the above data to QlikView”s memory and create a Table Box by using the menu Layout → New Sheet Objects → Table Box where we choose all the available fields to be displayed as shown below. Create Master Calendar Next, we create the Master Calendar by writing the following script in the script editor. Here we use the table DailySales as a resident table from which we capture the Maximum and Minimum dates. We load each of the dates within this range using the second load statement above the resident load. Finally, we have a third load statement, which extracts the year, quarter, month etc. from the SalesDate values. Select Fields After creation of the complete load script along with the master calendar, we create a table box to view the data using the menu Layout → New Sheet Objects → Table Box Final Data The final output shows the table showing the Quarter and Month values, which are created using the Sales data and Master Calendar. Print Page Previous Next Advertisements ”;

Aug 10

Plotly – Heatmap

Plotly – Heatmap ”; Previous Next A heat map (or heatmap) is a graphical representation of data where the individual values contained in a matrix are represented as colors. The primary purpose of Heat Maps is to better visualize the volume of locations/events within a dataset and assist in directing viewers towards areas on data visualizations that matter most. Because of their reliance on color to communicate values, Heat Maps are perhaps most commonly used to display a more generalized view of numeric values. Heat Maps are extremely versatile and efficient in drawing attention to trends, and it’s for these reasons they have become increasingly popular within the analytics community. Heat Maps are innately self-explanatory. The darker the shade, the greater the quantity (the higher the value, the tighter the dispersion, etc.). Plotly’s graph_objects module contains Heatmap() function. It needs x, y and z attributes. Their value can be a list, numpy array or Pandas dataframe. In the following example, we have a 2D list or array which defines the data (harvest by different farmers in tons/year) to color code. We then also need two lists of names of farmers and vegetables cultivated by them. vegetables = [ “cucumber”, “tomato”, “lettuce”, “asparagus”, “potato”, “wheat”, “barley” ] farmers = [ “Farmer Joe”, “Upland Bros.”, “Smith Gardening”, “Agrifun”, “Organiculture”, “BioGoods Ltd.”, “Cornylee Corp.” ] harvest = np.array( [ [0.8, 2.4, 2.5, 3.9, 0.0, 4.0, 0.0], [2.4, 0.0, 4.0, 1.0, 2.7, 0.0, 0.0], [1.1, 2.4, 0.8, 4.3, 1.9, 4.4, 0.0], [0.6, 0.0, 0.3, 0.0, 3.1, 0.0, 0.0], [0.7, 1.7, 0.6, 2.6, 2.2, 6.2, 0.0], [1.3, 1.2, 0.0, 0.0, 0.0, 3.2, 5.1], [0.1, 2.0, 0.0, 1.4, 0.0, 1.9, 6.3] ] ) trace = go.Heatmap( x = vegetables, y = farmers, z = harvest, type = ”heatmap”, colorscale = ”Viridis” ) data = [trace] fig = go.Figure(data = data) iplot(fig) The output of the above mentioned code is given as follows − Print Page Previous Next Advertisements ”;

Aug 10

Qlikview – Discussion

Discuss QlikView ”; Previous Next QlikView is a leading Business Discovery Platform. It is very powerful in visually analyzing the relationships between data. It does in-memory data processing and stores the data in the report itself that it creates. It can read data from numerous sources including files and relational databases. It is used by businesses to get deeper insight by doing advanced analytics on the data they have. It even does data integration by combining data from various sources into one QlikView analysis document. QlikView is a leading Business Intelligence and Analytics Platform in Gartner Magic Quadrant. Print Page Previous Next Advertisements ”;

Aug 10

QlikView – Cross Tables

QlikView – Cross Tables ”; Previous Next While analyzing data, we come across situations where we desire columns to become rows and vice-versa. It is not just about transposing, it also involves rolling up many columns together or repeating many values in a row many times to achieve the desired column and row layout in the table. Input data Consider the following input data, which shows region wise sales of a certain product for each quarter. We create a delimited file (CSV) with the below given data. Quarter,Region1,Region2,Region 3 Q1,124,421,471 Q2,415,214,584 Q3,417,321,582 Q4,751,256,95 Loading Input Data We load the above input data using the script editor, which is invoked by pressing Control+E. Choose the option Table Files and browse for the Input file. After choosing the options as shown below, click Next. Crosstable Options In the next window (File Wizard → Options), click on the Crosstable button. It highlights the columns in different colors. The pink color shows the qualifier field, which is going to be repeated across many rows for each value of in the Attribute Field. The cell values under the Attribute fields are taken as the data. Click OK. Crosstable Transformation The transformed data appears in which all the Region fields are clubbed to one column but with values repeating for each quarter. Load Script The Load script for the crosstable transformations shows the commands given below. Crosstable Data On creating a Table Box sheet object using the menu Layout → New Sheet Objects → Table Box, we get the following output. Print Page Previous Next Advertisements ”;

Aug 10

Splunk – Custom Chart

Splunk – Custom Chart ”; Previous Next The charts created in Splunk has many features to customize them as per the user need. These customizations help in displaying the data completely or changing the interval for which the data is calculated. After initially creating the chart, we dive into the customization features. Let us consider the below search query for getting the statistics of various measurements of byte size of the files by week day. We choose a column chart to display the graph and see the default values in the X-axis and Y-Axis values. Axis Customization We can customize the axes displayed in the chart by choosing the Format → X-axis button. Here, we edit the Title of the chart. We also edit the Label Rotation option to choose an inclined label to fit better into the chart. After editing these, results can be seen in the chart as highlighted using the green boxes below. Legend Customization The legends of the chart can also be customized by using the option Format → Legend. We edit the option Legend Position to mark it at Top. We also edit the Legend Truncation option to Truncate the End of the legend if required. The below cart shows the legends displayed at the top with colors and values. Print Page Previous Next Advertisements ”;

Aug 10

QlikView – RangeSum Function

QlikView – RangeSum Function ”; Previous Next The RangeSum() function in QlikView is used to do a selective sum on chosen fields which is not easily achieved by the sum function. It can take expressions containing other functions as its arguments and return the sum of those expressions. Input Data Let us consider the monthly sales figure as shown below. Save the data with file name monthly_sales.csv. Month,Sales Volume March,2145 April,2458 May,1245 June,5124 July,7421 August,2584 September,5314 October,7846 November,6532 December,4625 January,8547 February,3265 Load Script The above data is loaded to QlikView memory by using the script editor. Open the Script editor from the File menu or press Control+E. Choose the Table Files option from the Data from Files tab and browse for the file containing the above data. Edit the load script to add the following code. Click OK and click Control+R to load the data into QlikView”s memory. LOAD Month, [Sales Volume] FROM [C:Qlikviewdatamonthly_sales.csv] (txt, codepage is 1252, embedded labels, delimiter is ”,”, msq); Applying RangeSum() Function With the above data loaded into QlikView”s memory, we edit the script to add a new column, which will give a rolling sum of the month wise sales volume. For this, we also take the help of the peek function discussed in the earlier chapter to hold the value of the previous record and add it to the sales volume of the current record. The following script achieves the result. LOAD Month, [Sales Volume], rangesum([Sales Volume],peek(”Rolling”)) as Rolling FROM [C:Qlikviewdatamonthly_sales.csv] (txt, codepage is 1252, embedded labels, delimiter is ”,”, msq); Creating Sheet Object Let us create a Table Box sheet object to show the data generated by the above given script. Go to the menu Layout → New Sheet Object → Table Box. The following window appears in which we mention the Title of the table and select the required fields to be displayed. Clicking OK displays the data from the CSV file in the QlikView Table Box as shown below. Print Page Previous Next Advertisements ”;

Aug 10

Splunk – Removing Data

Splunk – Removing Data ”; Previous Next Removing data from Splunk is possible by using the delete command. We first create the search condition to fetch the events we want to mark for delete. Once the search condition is acceptable, we add the delete clause at the end of the command to remove those events from Splunk. After deletion, not even a user with admin privilege is able to view this data in Splunk. Removal of data is irreversible. If you still want the removed data back into Splunk then you should have the original source data copy with you which can be used to re-index the data in Splunk. It will be a process similar to creating a new index. Assigning Delete Privilege Any user including admin user does not have access to delete the data by default. By default, only the “can_delete” role has the ability to delete events. So, we create a new user, assign this role and then login with the credentials of this new user to perform the delete operation. The below image shows how we create a new user with “can_delete” role. We arrive at this screen by following the path Settings → Access Controls → Users → New User. We then log out of Splunk interface and login back with this newly created user. Identifying the data to be removed First, we need to identify the list of events we want to remove. It is done using a normal search query specifying the filter condition. In the below example, we choose to look for the events from the host web_application which has the field http status value as 505. Our goal is to delete only the set of data containing these values to be removed from the search result. The below image shows this set of data selected. Deleting the Selected Data Next, we use the delete command to remove the above selected data from the result set. It involves just adding the word delete after ‘|’ at the end of the search query as shown below − After running the search query above, we can see the next screen where those events have got deleted. You can also further run the search query to verify that these events are not returned in the result set. Print Page Previous Next Advertisements ”;

Aug 10

QlikView – Concatenation

QlikView – Concatenation ”; Previous Next Concatenation feature in QlikView is used to append the rows from one table to another. It happens even when the tables have different number of columns. It differs from both Join and Keep command, as it does not merge the matching rows from two tables into one row. Input Data Let us consider the following two CSV data files, which are used as input for further illustrations. Please note the second data set has an additional column named Country. SalesRegionOld.csv ProductID,ProductCategory,Region,SaleAmount 1,Outdoor Recreation,Europe,4579 2,Clothing,Europe,4125 3,Costumes & Accessories,South Asia,6521 4,Athletics,South Asia,4125 5,Personal Care,Australia,5124 6,Arts & Entertainment,North AMerica,1245 7,Hardware,South America,456 SalesRegionNew.csv ProductID,ProductCategory,Region,Country,SaleAmount 6,Arts & Entertainment,North AMerica,USA,1245 7,Hardware,South America,Brazil,456 8,Home & Garden,South America,Brazil,241 9,Food,South Asia,Singapore,1247 10,Home & Garden,South Asia,China,5462 11,Office Supplies,Australia,Australia,577 Load Script We load the above input data using the script editor, which is invoked by pressing Control+E. Choose the option Table Files and browse for the Input file. Then we edit the commands in the script to apply the concatenation between the tables. Next, we load the above data to QlikView”s memory and create a Table Box by using the menu Layout → New Sheet Objects → Table Box where we choose all the available fields to be displayed as shown below. Concatenated Data Completing above steps we get the Table box displayed as shown below. Please note the duplicate rows for the product ID 6 and 7. Concatenate does not eliminate the duplicates. Print Page Previous Next Advertisements ”;

Aug 10

QlikView – Column Manipulation

QlikView – Column Manipulation ”; Previous Next Column Manipulation is a type of Data Transformation in which a new column is populated with values from an existing column, which meets certain criteria. The criteria can be an expression, which is created as part of the Data Transformation step. Input Data Let us consider the following input data, which represents the actual and forecasted sales figures. Month,Forecast,Actual March,2145,2247 April,2458,2125 May,1245,2320 June,5124,3652 July,7421,7514 August,2584,3110 September,5314,4251 October,7846,6354 November,6532,7451 December,4625,1424 January,8547,7852 February,3265,2916 Load Script The above data is loaded to QlikView memory by using the script editor. Open the Script editor from the File menu or press Control+E. Choose the “Table Files” option from the “Data from Files” tab and browse for the file containing the above data. After clicking Next, we choose the Enable Transformation Step button to carry out the required data transformation. Selecting the Data Transformation Choose the Column tab and then choose the New button. It asks to specify the New column and the Row Condition. We specify column 3 as the source column and pick the values, which start with two as the Row Condition. Transformed Data On completing the above steps, we get the transformed data as shown below. Load Script for Transformed Data The load script for the Transformed data can be seen using the script editor. The script shows the expression, which creates the new column with required values. Display Transformed Data The transformed data can be seen by creating a Table Box using the option in the menu Layout → New Sheet Object. Print Page Previous Next Advertisements ”;