Statistics – Variance ”; Previous Next A variance is defined as the average of Squared differences from mean value. Combination is defined and given by the following function: Formula ${ delta = frac{ sum (M – n_i)^2 }{n}}$ Where − ${M}$ = Mean of items. ${n}$ = the number of items considered. ${n_i}$ = items. Example Problem Statement: Find the variance between following data : {600, 470, 170, 430, 300} Solution: Step 1: Determine the Mean of the given items. ${ M = frac{600 + 470 + 170 + 430 + 300}{5} \[7pt] = frac{1970}{5} \[7pt] = 394}$ Step 2: Determine Variance ${ delta = frac{ sum (M – n_i)^2 }{n} \[7pt] = frac{(600 – 394)^2 + (470 – 394)^2 + (170 – 394)^2 + (430 – 394)^2 + (300 – 394)^2}{5} \[7pt] = frac{(206)^2 + (76)^2 + (-224)^2 + (36)^2 + (-94)^2}{5} \[7pt] = frac{ 42,436 + 5,776 + 50,176 + 1,296 + 8,836}{5} \[7pt] = frac{ 108,520}{5} \[7pt] = frac{(14)(13)(3)(11)}{(2)(1)} \[7pt] = 21,704}$ As a result, Variance is ${21,704}$. Print Page Previous Next Advertisements ”;
Category: Big Data & Analytics
Tableau – Save & Delete Worksheet ”; Previous Next An existing worksheet can be both saved and deleted. This helps in organizing the contents in the Tableau desktop environment. While you can save a worksheet by clicking the save button under the main menu, you can delete a worksheet using the following steps. Deleting the Worksheet To delete a worksheet, right-click on name of the worksheet and choose the option ‘Delete Sheet’. The following screenshot shows the worksheet has been deleted. Print Page Previous Next Advertisements ”;
Sqoop – Installation
Sqoop – Installation ”; Previous Next As Sqoop is a sub-project of Hadoop, it can only work on Linux operating system. Follow the steps given below to install Sqoop on your system. Step 1: Verifying JAVA Installation You need to have Java installed on your system before installing Sqoop. Let us verify Java installation using the following command − $ java –version If Java is already installed on your system, you get to see the following response − java version “1.7.0_71” Java(TM) SE Runtime Environment (build 1.7.0_71-b13) Java HotSpot(TM) Client VM (build 25.0-b02, mixed mode) If Java is not installed on your system, then follow the steps given below. Installing Java Follow the simple steps given below to install Java on your system. Step 1 Download Java (JDK <latest version> – X64.tar.gz) by visiting the following link. Then jdk-7u71-linux-x64.tar.gz will be downloaded onto your system. Step 2 Generally, you can find the downloaded Java file in the Downloads folder. Verify it and extract the jdk-7u71-linux-x64.gz file using the following commands. $ cd Downloads/ $ ls jdk-7u71-linux-x64.gz $ tar zxf jdk-7u71-linux-x64.gz $ ls jdk1.7.0_71 jdk-7u71-linux-x64.gz Step 3 To make Java available to all the users, you have to move it to the location “/usr/local/”. Open root, and type the following commands. $ su password: # mv jdk1.7.0_71 /usr/local/java # exitStep IV: Step 4 For setting up PATH and JAVA_HOME variables, add the following commands to ~/.bashrc file. export JAVA_HOME=/usr/local/java export PATH=$PATH:$JAVA_HOME/bin Now apply all the changes into the current running system. $ source ~/.bashrc Step 5 Use the following commands to configure Java alternatives − # alternatives –install /usr/bin/java java usr/local/java/bin/java 2 # alternatives –install /usr/bin/javac javac usr/local/java/bin/javac 2 # alternatives –install /usr/bin/jar jar usr/local/java/bin/jar 2 # alternatives –set java usr/local/java/bin/java # alternatives –set javac usr/local/java/bin/javac # alternatives –set jar usr/local/java/bin/jar Now verify the installation using the command java -version from the terminal as explained above. Step 2: Verifying Hadoop Installation Hadoop must be installed on your system before installing Sqoop. Let us verify the Hadoop installation using the following command − $ hadoop version If Hadoop is already installed on your system, then you will get the following response − Hadoop 2.4.1 — Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768 Compiled by hortonmu on 2013-10-07T06:28Z Compiled with protoc 2.5.0 From source with checksum 79e53ce7994d1628b240f09af91e1af4 If Hadoop is not installed on your system, then proceed with the following steps − Downloading Hadoop Download and extract Hadoop 2.4.1 from Apache Software Foundation using the following commands. $ su password: # cd /usr/local # wget http://apache.claz.org/hadoop/common/hadoop-2.4.1/ hadoop-2.4.1.tar.gz # tar xzf hadoop-2.4.1.tar.gz # mv hadoop-2.4.1/* to hadoop/ # exit Installing Hadoop in Pseudo Distributed Mode Follow the steps given below to install Hadoop 2.4.1 in pseudo-distributed mode. Step 1: Setting up Hadoop You can set Hadoop environment variables by appending the following commands to ~/.bashrc file. export HADOOP_HOME=/usr/local/hadoop export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin Now, apply all the changes into the current running system. $ source ~/.bashrc Step 2: Hadoop Configuration You can find all the Hadoop configuration files in the location “$HADOOP_HOME/etc/hadoop”. You need to make suitable changes in those configuration files according to your Hadoop infrastructure. $ cd $HADOOP_HOME/etc/hadoop In order to develop Hadoop programs using java, you have to reset the java environment variables in hadoop-env.sh file by replacing JAVA_HOME value with the location of java in your system. export JAVA_HOME=/usr/local/java Given below is the list of files that you need to edit to configure Hadoop. core-site.xml The core-site.xml file contains information such as the port number used for Hadoop instance, memory allocated for the file system, memory limit for storing the data, and the size of Read/Write buffers. Open the core-site.xml and add the following properties in between the <configuration> and </configuration> tags. <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000 </value> </property> </configuration> hdfs-site.xml The hdfs-site.xml file contains information such as the value of replication data, namenode path, and datanode path of your local file systems. It means the place where you want to store the Hadoop infrastructure. Let us assume the following data. dfs.replication (data replication value) = 1 (In the following path /hadoop/ is the user name. hadoopinfra/hdfs/namenode is the directory created by hdfs file system.) namenode path = //home/hadoop/hadoopinfra/hdfs/namenode (hadoopinfra/hdfs/datanode is the directory created by hdfs file system.) datanode path = //home/hadoop/hadoopinfra/hdfs/datanode Open this file and add the following properties in between the <configuration>, </configuration> tags in this file. <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.name.dir</name> <value>file:///home/hadoop/hadoopinfra/hdfs/namenode </value> </property> <property> <name>dfs.data.dir</name> <value>file:///home/hadoop/hadoopinfra/hdfs/datanode </value> </property> </configuration> Note − In the above file, all the property values are user-defined and you can make changes according to your Hadoop infrastructure. yarn-site.xml This file is used to configure yarn into Hadoop. Open the yarn-site.xml file and add the following properties in between the <configuration>, </configuration> tags in this file. <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration> mapred-site.xml This file is used to specify which MapReduce framework we are using. By default, Hadoop contains a template of yarn-site.xml. First of all, you need to copy the file from mapred-site.xml.template to mapred-site.xml file using the following command. $ cp mapred-site.xml.template mapred-site.xml Open mapred-site.xml file and add the following properties in between the <configuration>, </configuration> tags in this file. <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration> Verifying Hadoop Installation The following steps are used to verify the Hadoop installation. Step 1: Name Node Setup Set up the namenode using the command “hdfs namenode -format” as follows. $ cd ~ $ hdfs namenode -format The expected result is as follows. 10/24/14 21:30:55 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = localhost/192.168.1.11 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 2.4.1 … … 10/24/14 21:30:56 INFO common.Storage: Storage directory /home/hadoop/hadoopinfra/hdfs/namenode has been successfully formatted. 10/24/14 21:30:56 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0 10/24/14 21:30:56 INFO util.ExitUtil: Exiting with status 0 10/24/14 21:30:56 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at localhost/192.168.1.11 ************************************************************/ Step 2: Verifying Hadoop dfs The following command is used to start dfs. Executing this
Sqoop – Import
Sqoop – Import ”; Previous Next This chapter describes how to import data from MySQL database to Hadoop HDFS. The ‘Import tool’ imports individual tables from RDBMS to HDFS. Each row in a table is treated as a record in HDFS. All records are stored as text data in the text files or as binary data in Avro and Sequence files. Syntax The following syntax is used to import data into HDFS. $ sqoop import (generic-args) (import-args) $ sqoop-import (generic-args) (import-args) Example Let us take an example of three tables named as emp, emp_add, and emp_contact, which are in a database called userdb in a MySQL database server. The three tables and their data are as follows. emp: id name deg salary dept 1201 gopal manager 50,000 TP 1202 manisha Proof reader 50,000 TP 1203 khalil php dev 30,000 AC 1204 prasanth php dev 30,000 AC 1204 kranthi admin 20,000 TP emp_add: id hno street city 1201 288A vgiri jublee 1202 108I aoc sec-bad 1203 144Z pgutta hyd 1204 78B old city sec-bad 1205 720X hitec sec-bad emp_contact: id phno email 1201 2356742 [email protected] 1202 1661663 [email protected] 1203 8887776 [email protected] 1204 9988774 [email protected] 1205 1231231 [email protected] Importing a Table Sqoop tool ‘import’ is used to import table data from the table to the Hadoop file system as a text file or a binary file. The following command is used to import the emp table from MySQL database server to HDFS. $ sqoop import –connect jdbc:mysql://localhost/userdb –username root –table emp –m 1 If it is executed successfully, then you get the following output. 14/12/22 15:24:54 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5 14/12/22 15:24:56 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 14/12/22 15:24:56 INFO tool.CodeGenTool: Beginning code generation 14/12/22 15:24:58 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1 14/12/22 15:24:58 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1 14/12/22 15:24:58 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop 14/12/22 15:25:11 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/cebe706d23ebb1fd99c1f063ad51ebd7/emp.jar —————————————————– —————————————————– 14/12/22 15:25:40 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1419242001831_0001/ 14/12/22 15:26:45 INFO mapreduce.Job: Job job_1419242001831_0001 running in uber mode : false 14/12/22 15:26:45 INFO mapreduce.Job: map 0% reduce 0% 14/12/22 15:28:08 INFO mapreduce.Job: map 100% reduce 0% 14/12/22 15:28:16 INFO mapreduce.Job: Job job_1419242001831_0001 completed successfully —————————————————– —————————————————– 14/12/22 15:28:17 INFO mapreduce.ImportJobBase: Transferred 145 bytes in 177.5849 seconds (0.8165 bytes/sec) 14/12/22 15:28:17 INFO mapreduce.ImportJobBase: Retrieved 5 records. To verify the imported data in HDFS, use the following command. $ $HADOOP_HOME/bin/hadoop fs -cat /emp/part-m-* It shows you the emp table data and fields are separated with comma (,). 1201, gopal, manager, 50000, TP 1202, manisha, preader, 50000, TP 1203, kalil, php dev, 30000, AC 1204, prasanth, php dev, 30000, AC 1205, kranthi, admin, 20000, TP Importing into Target Directory We can specify the target directory while importing table data into HDFS using the Sqoop import tool. Following is the syntax to specify the target directory as option to the Sqoop import command. –target-dir <new or exist directory in HDFS> The following command is used to import emp_add table data into ‘/queryresult’ directory. $ sqoop import –connect jdbc:mysql://localhost/userdb –username root –table emp_add –m 1 –target-dir /queryresult The following command is used to verify the imported data in /queryresult directory form emp_add table. $ $HADOOP_HOME/bin/hadoop fs -cat /queryresult/part-m-* It will show you the emp_add table data with comma (,) separated fields. 1201, 288A, vgiri, jublee 1202, 108I, aoc, sec-bad 1203, 144Z, pgutta, hyd 1204, 78B, oldcity, sec-bad 1205, 720C, hitech, sec-bad Import Subset of Table Data We can import a subset of a table using the ‘where’ clause in Sqoop import tool. It executes the corresponding SQL query in the respective database server and stores the result in a target directory in HDFS. The syntax for where clause is as follows. –where <condition> The following command is used to import a subset of emp_add table data. The subset query is to retrieve the employee id and address, who lives in Secunderabad city. $ sqoop import –connect jdbc:mysql://localhost/userdb –username root –table emp_add –m 1 –where “city =’sec-bad’” –target-dir /wherequery The following command is used to verify the imported data in /wherequery directory from the emp_add table. $ $HADOOP_HOME/bin/hadoop fs -cat /wherequery/part-m-* It will show you the emp_add table data with comma (,) separated fields. 1202, 108I, aoc, sec-bad 1204, 78B, oldcity, sec-bad 1205, 720C, hitech, sec-bad Incremental Import Incremental import is a technique that imports only the newly added rows in a table. It is required to add ‘incremental’, ‘check-column’, and ‘last-value’ options to perform the incremental import. The following syntax is used for the incremental option in Sqoop import command. –incremental <mode> –check-column <column name> –last value <last check column value> Let us assume the newly added data into emp table is as follows − 1206, satish p, grp des, 20000, GR The following command is used to perform the incremental import in the emp table. $ sqoop import –connect jdbc:mysql://localhost/userdb –username root –table emp –m 1 –incremental append –check-column id -last value 1205 The following command is used to verify the imported data from emp table to HDFS emp/ directory. $ $HADOOP_HOME/bin/hadoop fs -cat /emp/part-m-* It shows you the emp table data with comma (,) separated fields. 1201, gopal, manager, 50000, TP 1202, manisha, preader, 50000, TP 1203, kalil, php dev, 30000, AC 1204, prasanth, php dev, 30000, AC 1205, kranthi, admin, 20000, TP 1206, satish p, grp des, 20000, GR The following command is used to see the modified or newly added rows from the emp table. $ $HADOOP_HOME/bin/hadoop fs -cat /emp/part-m-*1 It shows you the newly added rows to the emp table with comma (,) separated fields. 1206, satish p, grp des, 20000, GR Print Page Previous Next Advertisements ”;
Tableau – File Types
Tableau – File Types ”; Previous Next The result of data analysis in Tableau can be saved in various formats, to be saved and distributed. The various formats are referred as different file types and they are identified by different extensions. Their formats depend on how they are produced and for what purposes they are used. They are all stored as XML files, which can be opened and edited. Following table lists the description of each file type and their usage. File Type File Extension Purpose Tableau Workbook .twb It contains information on each sheet and dashboard that is present in a workbook. It has the details of the fields, which are used in each view and the formula applied to the aggregation of the measures. It also has the formatting and styles applied. It contains the data source connection information and any metadata information created for that connection. Tableau Packaged Workbook .twbx This file format contains the details of a workbook as well as the local data that is used in the analysis. Its purpose is to share with other Tableau desktop or Tableau reader users, assuming it does not need data from the server. Tableau Data Source .tds The details of the connection used to create the tableau report are stored in this file. In the connection details, it stores the source type (excel/relational/sap, etc.) as well as the data types of the columns. Tableau Packaged Data source .tdsx This file is similar to the .tds file with the addition of data along with the connection details. Tableau Data Extract .tde This file contains the data used in a .twb file in a highly compressed columnar data format. This helps in storage optimization. It also saves the aggregated calculations that are applied in the analysis. This file should be refreshed to get the updated data from the source. Tableau Bookmark .tbm These files contain a single worksheet that is shared easily to be pasted into other workbooks. Tableau Preferences .tps This file stores the color preference used across all the workbooks. It is mainly used for consistent look and feel across the users. Print Page Previous Next Advertisements ”;
Tableau – Discussion
Discuss Tableau ”; Previous Next Tableau is a Business Intelligence tool for visually analyzing the data. Users can create and distribute an interactive and shareable dashboard, which depict the trends, variations, and density of the data in the form of graphs and charts. Tableau can connect to files, relational and Big Data sources to acquire and process data. The software allows data blending and real-time collaboration, which makes it very unique. It is used by businesses, academic researchers, and many government organizations for visual data analysis. It is also positioned as a leader Business Intelligence and Analytics Platform in Gartner Magic Quadrant. Print Page Previous Next Advertisements ”;
Tableau – Get Started
Tableau – Get Started ”; Previous Next In this chapter, you will learn some basic operations in Tableau to get acquainted with its interface. There are three basic steps involved in creating any Tableau data analysis report. These three steps are − Connect to a data source − It involves locating the data and using an appropriate type of connection to read the data. Choose dimensions and measures − This involves selecting the required columns from the source data for analysis. Apply visualization technique − This involves applying required visualization methods, such as a specific chart or graph type to the data being analyzed. For convenience, let’s use the sample data set that comes with Tableau installation named sample – superstore.xls. Locate the installation folder of Tableau and go to My Tableau Repository. Under it, you will find the above file at Datasources9.2en_US-US. Connect to a Data Source On opening Tableau, you will get the start page showing various data sources. Under the header “Connect”, you have options to choose a file or server or saved data source. Under Files, choose excel. Then navigate to the file “Sample – Superstore.xls” as mentioned above. The excel file has three sheets named Orders, People and Returns. Choose Orders. Choose the Dimensions and Measures Next, choose the data to be analyzed by deciding on the dimensions and measures. Dimensions are the descriptive data while measures are numeric data. When put together, they help visualize the performance of the dimensional data with respect to the data which are measures. Choose Category and Region as the dimensions and Sales as the measure. Drag and drop them as shown in the following screenshot. The result shows the total sales in each category for each region. Apply Visualization Technique In the previous step, you can see that the data is available only as numbers. You have to read and calculate each of the values to judge the performance. However, you can see them as graphs or charts with different colors to make a quicker judgment. We drag and drop the sum (sales) column from the Marks tab to the Columns shelf. The table showing the numeric values of sales now turns into a bar chart automatically. You can apply a technique of adding another dimension to the existing data. This will add more colors to the existing bar chart as shown in the following screenshot. Print Page Previous Next Advertisements ”;
Simple random sampling
Statistics – Simple random sampling ”; Previous Next A simple random sample is defined as one in which each element of the population has an equal and independent chance of being selected. In case of a population with N units, the probability of choosing n sample units, with all possible combinations of NCn samples is given by 1/NCn e.g. If we have a population of five elements (A, B, C, D, E) i.e. N 5, and we want a sample of size n = 3, then there are 5C3 = 10 possible samples and the probability of any single unit being a member of the sample is given by 1/10. Simple random sampling can be done in two different ways i.e. ”with replacement” or ”without replacement”. When the units are selected into a sample successively after replacing the selected unit before the next draw, it is a simple random sample with replacement. If the units selected are not replaced before the next draw and drawing of successive units are made only from the remaining units of the population, then it is termed as simple random sample without replacement. Thus in the former method a unit once selected may be repeated, whereas in the latter a unit once selected is not repeated. Due to more statistical efficiency associated with a simple random sample without replacement it is the preferred method. A simple random sample can be drawn through either of the two procedures i.e. through lottery method or through random number tables. Lottery Method – Under this method units are selected on the basis of random draws. Firstly each member or element of the population is assigned a unique number. In the next step these numbers are written on separate cards which are physically similar in shape, size, color etc. Then they are placed in a basket and thoroughly mixed. In the last step the slips are taken out randomly without looking at them. The number of slips drawn is equal to the sample size required. Lottery method suffers from few drawbacks. The process of writing N number of slips is cumbersome and shuffling a large number of slips, where population size is very large, is difficult. Also human bias may enter while choosing the slips. Hence the other alternative i.e. random numbers can be used. Random Number Tables Method – These consist of columns of numbers which have been randomly prepared. Number of random tables are available e.g. Fisher and Yates Tables, Tippets random number etc. Listed below is a sequence of two digited random numbers from Fisher & Yates table: 61, 44, 65, 22, 01, 67, 76, 23, 57, 58, 54, 11, 33, 86, 07, 26, 75, 76, 64, 22, 19, 35, 74, 49, 86, 58, 69, 52, 27, 34, 91, 25, 34, 67, 76, 73, 27, 16, 53, 18, 19, 69, 32, 52, 38, 72, 38, 64, 81, 79 and 38. The first step involves assigning a unique number to each member of the population e.g. if the population comprises of 20 people then all individuals are numbered from 01 to 20. If we are to collect a sample of 5 units then referring to the random number tables 5 double digit numbers are chosen. E.g. using the above table the units having the following five numbers will form a sample: 01, 11, 07, 19 and 16. If the sampling is without replacement and a particular random number repeats itself then it will not be taken again and the next number that fits our criteria will be chosen. Thus a simple random sample can be drawn using either of the two procedures. However in practice, it has been seen that simple random sample involves lots of time and effort and is impractical. Print Page Previous Next Advertisements ”;
Tableau – Data Terminology
Tableau – Data Terminology ”; Previous Next As a powerful data visualization tool, Tableau has many unique terms and definitions. You need to get acquainted with their meaning before you start using the features in Tableau. The following list of terms is comprehensive and explains the terms most frequently used. S.No Terms & Meaning 1 Alias An alternative name that you can assign to a field or to a dimension member. 2 Bin A user-defined grouping of measures in the data source. 3 Bookmark A .tbm file in the Bookmarks folder in the Tableau repository that contains a single worksheet. Much like web browser bookmarks, .tbm files are a convenient way to quickly display different analyses. 4 Calculated Field A new field that you create by using a formula to modify the existing fields in your data source. 5 Crosstab A text table view. Use text tables to display the numbers associated with dimension members. 6 Dashboard A combination of several views arranged on a single page. Use dashboards to compare and monitor a variety of data simultaneously. 7 Data Pane A pane on the left side of the workbook that displays the fields of the data sources to which Tableau is connected. The fields are divided into dimensions and measures. The data pane also displays custom fields such as calculations, binned fields, and groups. You build views of your data by dragging fields from the data pane onto the various shelves that are a part of every worksheet. 8 Data Source Page A page where you can set up your data source. The data source page generally consists of four main areas − left pane, join area, preview area, and metadata area. 9 Dimension A field of categorical data. Dimensions typically hold discrete data such as hierarchies and members that cannot be aggregated. Examples of dimensions include dates, customer names, and customer segments. 10 Extract A saved subset of a data source that you can use to improve performance and analyze offline. You can create an extract by defining filters and limits that include the data you want in the extract. 11 Filters Shelf A shelf on the left of the workbook that you can use to exclude data from a view by filtering it using measures and dimensions. 12 Format Pane A pane that contains formatting settings that control the entire worksheet, as well as individual fields in the view. When open, the Format pane appears on the left side of the workbook. 13 Level Of Detail (LOD) Expression A syntax that supports aggregation at dimensionalities other than the view level. With the level of detail expressions, you can attach one or more dimensions to any aggregate expression. 14 Marks A part of the view that visually represents one or more rows in a data source. A mark can be, for example, a bar, line, or square. You can control the type, color, and size of marks. 15 Marks Card A card to the left of the view, where you can drag fields to control mark properties such as type, color, size, shape, label, tooltip, and detail. 16 Pages Shelf A shelf to the left of the view that you can use to split a view into a sequence of pages based on the members and values in a discrete or continuous field. Adding a field to the Pages shelf is like adding a field to the Rows shelf, except that a new page is created for each new row. 17 Rows Shelf A shelf at the top of the workbook that you can use to create the rows of a data table. The shelf accepts any number of dimensions and measures. When you place a dimension on the Rows shelf, Tableau creates headers for the members of that dimension. When you place a measure on the Rows shelf, Tableau creates quantitative axes for that measure. 18 Shelves Named areas to the left and top of the view. You build views by placing fields onto the shelves. Some shelves are available only when you select certain mark types. For example, the Shape shelf is available only when you select the Shape mark type. 19 Workbook A file with a .twb extension that contains one or more worksheets (and possibly also dashboards and stories). 20 Worksheet A sheet where you build views of your data by dragging fields onto shelves. Print Page Previous Next Advertisements ”;
Standard Error ( SE )
Statistics – Standard Error ( SE ) ”; Previous Next The standard deviation of a sampling distribution is called as standard error. In sampling, the three most important characteristics are: accuracy, bias and precision. It can be said that: The estimate derived from any one sample is accurate to the extent that it differs from the population parameter. Since the population parameters can only be determined by a sample survey, hence they are generally unknown and the actual difference between the sample estimate and population parameter cannot be measured. The estimator is unbiased if the mean of the estimates derived from all the possible samples equals the population parameter. Even if the estimator is unbiased an individual sample is most likely going to yield inaccurate estimate and as stated earlier, inaccuracy cannot be measured. However it is possible to measure the precision i.e. the range between which the true value of the population parameter is expected to lie, using the concept of standard error. Formula $SE_bar{x} = frac{s}{sqrt{n}}$ Where − ${s}$ = Standard Deviation and ${n}$ = No.of observations Example Problem Statement: Calculate Standard Error for the following individual data: Items 14 36 45 70 105 Solution: Let”s first compute the Arithmetic Mean $bar{x}$ $bar{x} = frac{14 + 36 + 45 + 70 + 105}{5} \[7pt] , = frac{270}{5} \[7pt] , = {54}$ Let”s now compute the Standard Deviation ${s}$ $s = sqrt{frac{1}{n-1}((x_{1}-bar{x})^{2}+(x_{2}-bar{x})^{2}+…+(x_{n}-bar{x})^{2})} \[7pt] , = sqrt{frac{1}{5-1}((14-54)^{2}+(36-54)^{2}+(45-54)^{2}+(70-54)^{2}+(105-54)^{2})} \[7pt] , = sqrt{frac{1}{4}(1600+324+81+256+2601)} \[7pt] , = {34.86}$ Thus the Standard Error $SE_bar{x}$ $SE_bar{x} = frac{s}{sqrt{n}} \[7pt] , = frac{34.86}{sqrt{5}} \[7pt] , = frac{34.86}{2.23} \[7pt] , = {15.63}$ The Standard Error of the given numbers is 15.63. The smaller the proportion of the population that is sampled the less is the effect of this multiplier because then the finite multiplier will be close to one and will affect the standard error negligibly. Hence if the sample size is less than 5% of population, the finite multiplier is ignored. Print Page Previous Next Advertisements ”;