ggplot2 – Working with Axes

ggplot2 – Working with Axes ”; Previous Next When we speak about axes in graphs, it is all about x and y axis which is represented in two dimensional manner. In this chapter, we will focus about two datasets “Plantgrowth” and “Iris” dataset which is commonly used by data scientists. Implementing axes in Iris dataset We will use the following steps to work on x and y axes using ggplot2 package of R. It is always important to load the library to get the functionalities of package. # Load ggplot library(ggplot2) # Read in dataset data(iris) Creating the plot points Like discussed in the previous chapter, we will create a plot with points in it. In other words, it is defined as scattered plot. # Plot p <- ggplot(iris, aes(Sepal.Length, Petal.Length, colour=Species)) + geom_point() p Now let us understand the functionality of aes which mentions the mapping structure of “ggplot2”. Aesthetic mappings describe the variable structure which is needed for plotting and the data which should be managed in individual layer format. The output is given below − Highlight and tick marks Plot the markers with mentioned co-ordinates of x and y axes as mentioned below. It includes adding text, repeating text, highlighting particular area and adding segment as follows − # add text p + annotate(“text”, x = 6, y = 5, label = “text”) # add repeat p + annotate(“text”, x = 4:6, y = 5:7, label = “text”) # highlight an area p + annotate(“rect”, xmin = 5, xmax = 7, ymin = 4, ymax = 6, alpha = .5) # segment p + annotate(“segment”, x = 5, xend = 7, y = 4, yend = 5, colour = “black”) The output generated for adding text is given below − Repeating particular text with mentioned co-ordinates generates the following output. The text is generated with x co-ordinates from 4 to 6 and y co-ordinates from 5 to 7 − The segmentation and highlighting of particular area output is given below − PlantGrowth Dataset Now let us focus on working with other dataset called “Plantgrowth” and the step which is needed is given below. Call for the library and check out the attributes of “Plantgrowth”. This dataset includes results from an experiment to compare yields (as measured by dried weight of plants) obtained under a control and two different treatment conditions. > PlantGrowth weight group 1 4.17 ctrl 2 5.58 ctrl 3 5.18 ctrl 4 6.11 ctrl 5 4.50 ctrl 6 4.61 ctrl 7 5.17 ctrl 8 4.53 ctrl 9 5.33 ctrl 10 5.14 ctrl 11 4.81 trt1 12 4.17 trt1 13 4.41 trt1 14 3.59 trt1 15 5.87 trt1 16 3.83 trt1 17 6.03 trt1 Adding attributes with axes Try plotting a simple plot with required x and y axis of the graph as mentioned below − > bp <- ggplot(PlantGrowth, aes(x=group, y=weight)) + + geom_point() > bp The output generated is given below − Finally, we can swipe x and y axes as per our requirement with basic function as mentioned below − > bp <- ggplot(PlantGrowth, aes(x=group, y=weight)) + + geom_point() > bp Basically, we can use many properties with aesthetic mappings to get working with axes using ggplot2. Print Page Previous Next Advertisements ”;

ggplot2 – Time Series

ggplot2 – Time Series ”; Previous Next A time series is a graphical plot which represents the series of data points in a specific time order. A time series is a sequence taken with a sequence at a successive equal spaced points of time. Time series can be considered as discrete-time data. The dataset which we will use in this chapter is “economics” dataset which includes all the details of US economic time series. The dataframe includes following attributes which is mentioned below − Date Month of data collection Psavert Personal savings rate Pce Personal consumption expenditure Unemploy Number of unemployed in thousands Unempmed Median duration of unemployment Pop Total population in thousands Load the required packages and set the default theme to create a time series. > library(ggplot2) > theme_set(theme_minimal()) > # Demo dataset > head(economics) # A tibble: 6 x 6 date pce pop psavert uempmed unemploy <date> <dbl> <dbl> <dbl> <dbl> <dbl> 1 1967-07-01 507. 198712 12.6 4.5 2944 2 1967-08-01 510. 198911 12.6 4.7 2945 3 1967-09-01 516. 199113 11.9 4.6 2958 4 1967-10-01 512. 199311 12.9 4.9 3143 5 1967-11-01 517. 199498 12.8 4.7 3066 6 1967-12-01 525. 199657 11.8 4.8 3018 Create a basic line plots which creates a time series structure. > # Basic line plot > ggplot(data = economics, aes(x = date, y = pop))+ + geom_line(color = “#00AFBB”, size = 2) We can plot the subset of data using following command − > # Plot a subset of the data > ss <- subset(economics, date > as.Date(“2006-1-1”)) > ggplot(data = ss, aes(x = date, y = pop)) + + geom_line(color = “#FC4E07”, size = 2) Creating Time Series Here we will plot the variables psavert and uempmed by dates. Here we must reshape the data using the tidyr package. This can be achieved by collapsing psavert and uempmed values in the same column (new column). R function: gather()[tidyr]. The next step involves creating a grouping variable that with levels = psavert and uempmed. > library(tidyr) > library(dplyr) Attaching package: ‘dplyr’ The following object is masked from ‘package:ggplot2’: vars The following objects are masked from ‘package:stats’: filter, lag The following objects are masked from ‘package:base’: intersect, setdiff, setequal, union > df <- economics %>% + select(date, psavert, uempmed) %>% + gather(key = “variable”, value = “value”, -date) > head(df, 3) # A tibble: 3 x 3 date variable value <date> <chr> <dbl> 1 1967-07-01 psavert 12.6 2 1967-08-01 psavert 12.6 3 1967-09-01 psavert 11.9 Create a multiple line plots using following command to have a look on the relationship between “psavert” and “unempmed” − > ggplot(df, aes(x = date, y = value)) + + geom_line(aes(color = variable), size = 1) + + scale_color_manual(values = c(“#00AFBB”, “#E7B800″)) + + theme_minimal() Print Page Previous Next Advertisements ”;

ggplot2 – Quick Guide

ggplot2 – Quick Guide ”; Previous Next ggplot2 – Introduction ggplot2 is an R package which is designed especially for data visualization and providing best exploratory data analysis. It provides beautiful, hassle-free plots that take care of minute details like drawing legends and representing them. The plots can be created iteratively and edited later. This package is designed to work in a layered fashion, starting with a layer showing the raw data collected during exploratory data analysis with R then adding layers of annotations and statistical summaries. Even the most experienced R users need help for creating elegant graphics. This library is a phenomenal tool for creating graphics in R but even after many years of near-daily use we still need to refer to our Cheat Sheet. This package works under deep grammar called as “Grammar of graphics” which is made up of a set of independent components that can be created in many ways. “Grammar of graphics” is the only sole reason which makes ggplot2 very powerful because the R developer is not limited to set of pre-specified graphics which is used in other packages. The grammar includes simple set of core rules and principles. In the year 2005, Wilkinson created or rather originated the concept of grammar of graphics to describe the deep features which is included between all statistical graphics. It focuses on the primary of layers which includes adapting features embedded with R. Relationship between “Grammar of Graphics” and R It tells the user or developer that a statistical graphic is used for mapping the data to aesthetic attributes such as color, shape, size of the concerned geometric objects like points, lines and bars. The plot may also contain various statistical transformations of the concerned data which is drawn on the mentioned coordinate system. It also includes a feature called as “Faceting” which is generally used to create the same plot for different subsets of the mentioned dataset. R includes various in-built datasets. The combination of these independent components totally comprises a particular graphic. Now let us focus on different types of plots which can be created with reference to the grammar − Data If user wants to visualize the given set of aesthetic mappings which describes how the required variables in the data are mapped together for creation of mapped aesthetic attributes. Layers It is made up of geometric elements and the required statistical transformation. Layers include geometric objects, geoms for short data which actually represent the plot with the help of points, lines, polygons and many more. The best demonstration is binning and counting the observations to create the specific histogram for summarizing the 2D relationship of a specific linear model. Scales Scales are used to map values in the data space which is used for creation of values whether it is color, size and shape. It helps to draw a legend or axes which is needed to provide an inverse mapping making it possible to read the original data values from the mentioned plot. Coordinate System It describes how the data coordinates are mapped together to the mentioned plane of the graphic. It also provides information of the axes and gridlines which is needed to read the graph. Normally it is used as a Cartesian coordinate system which includes polar coordinates and map projections. Faceting It includes specification on how to break up the data into required subsets and displaying the subsets as multiples of data. This is also called as conditioning or latticing process. Theme It controls the finer points of display like the font size and background color properties. To create an attractive plot, it is always better to consider the references. Now, it is also equally important to discuss the limitations or features which grammar doesn’t provide − It lacks the suggestion of which graphics should be used or a user is interested to do. It does not describe the interactivity as it includes only description of static graphics. For creation of dynamic graphics other alternative solution should be applied. The simple graph created with ggplot2 is mentioned below − ggplot2 – Installation of R R packages come with various capabilities like analyzing statistical information or getting in depth research of geospatial data or simple we can create basic reports. Packages of R can be defined as R functions, data and compiled code in a well-defined format. The folder or directory where the packages are stored is called the library. As visible in the above figure, libPaths() is the function which displays you the library which is located, and the function library shows the packages which are saved in the library. R includes number of functions which manipulates the packages. We will focus on three major functions which is primarily used, they are − Installing Package Loading a Package Learning about Package The syntax with function for installing a package in R is − Install.packages(“<package-name>”) The simple demonstration of installing a package is visible below. Consider we need to install package “ggplot2” which is data visualization library, the following syntax is used − Install.packages(“ggplot2”) To load the particular package, we need to follow the below mentioned syntax − Library(<package-name>) The same applies for ggplot2 as mentioned below − library(“ggplot2”) The output is depicted in snapshot below − To understand the need of required package and basic functionality, R provides help function which gives the complete detail of package which is installed. The complete syntax is mentioned below − help(ggplot2) ggplot2 – Default Plot in R In this chapter, we will focus on creating a simple plot with the help of ggplot2. We will use

ggplot2 – Working with Legends

ggplot2 – Working with Legends ”; Previous Next Axes and legends are collectively called as guides. They allow us to read observations from the plot and map them back with respect to original values. The legend keys and tick labels are both determined by the scale breaks. Legends and axes are produced automatically based on the respective scales and geoms which are needed for plot. Following steps will be implemented to understand the working of legends in ggplot2 − Inclusion of package and dataset in workspace Let us create the same plot for focusing on the legend of the graph generated with ggplot2 − > # Load ggplot > library(ggplot2) > > # Read in dataset > data(iris) > > # Plot > p <- ggplot(iris, aes(Sepal.Length, Petal.Length, colour=Species)) + geom_point() > p If you observe the plot, the legends are created on left most corners as mentioned below − Here, the legend includes various types of species of the given dataset. Changing attributes for legends We can remove the legend with the help of property “legend.position” and we get the appropriate output − > # Remove Legend > p + theme(legend.position=”none”) We can also hide the title of legend with property “element_blank()” as given below − > # Hide the legend title > p + theme(legend.title=element_blank()) We can also use the legend position as and when needed. This property is used for generating the accurate plot representation. > #Change the legend position > p + theme(legend.position=”top”) > > p + theme(legend.position=”bottom”) Top representation Bottom representation Changing font style of legends We can change the font style and font type of title and other attributes of legend as mentioned below − > #Change the legend title and text font styles > # legend title > p + theme(legend.title = element_text(colour = “blue”, size = 10, + face = “bold”)) > # legend labels > p + theme(legend.text = element_text(colour = “red”, size = 8, + face = “bold”)) The output generated is given below − Upcoming chapters will focus on various types of plots with various background properties like color, themes and the importance of each one of them from data science point of view. Print Page Previous Next Advertisements ”;

ggplot2 – Multiple Plots

ggplot2 – Multiple Plots ”; Previous Next In this chapter, we will focus on creation of multiple plots which can be further used to create 3 dimensional plots. The list of plots which will be covered includes − Density Plot Box Plot Dot Plot Violin Plot We will use “mpg” dataset as used in previous chapters. This dataset provides fuel economy data from 1999 and 2008 for 38 popular models of cars. The dataset is shipped with ggplot2 package. It is important to follow the below mentioned step to create different types of plots. > # Load Modules > library(ggplot2) > > # Dataset > head(mpg) # A tibble: 6 x 11 manufacturer model displ year cyl trans drv cty hwy fl class <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr> 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compa~ 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compa~ 3 audi a4 2 2008 4 manual(m6) f 20 31 p compa~ 4 audi a4 2 2008 4 auto(av) f 21 30 p compa~ 5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compa~ 6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compa~ Density Plot A density plot is a graphic representation of the distribution of any numeric variable in mentioned dataset. It uses a kernel density estimate to show the probability density function of the variable. “ggplot2” package includes a function called geom_density() to create a density plot. We will execute the following command to create a density plot − > p −- ggplot(mpg, aes(cty)) + + geom_density(aes(fill=factor(cyl)), alpha=0.8) > p We can observe various densities from the plot created below − We can create the plot by renaming the x and y axes which maintains better clarity with inclusion of title and legends with different color combinations. > p + labs(title=”Density plot”, + subtitle=”City Mileage Grouped by Number of cylinders”, + caption=”Source: mpg”, + x=”City Mileage”, + fill=”# Cylinders”) Box Plot Box plot also called as box and whisker plot represents the five-number summary of data. The five number summaries include values like minimum, first quartile, median, third quartile and maximum. The vertical line which goes through the middle part of box plot is considered as “median”. We can create box plot using the following command − > p <- ggplot(mpg, aes(class, cty)) + + geom_boxplot(varwidth=T, fill=”blue”) > p + labs(title=”A Box plot Example”, + subtitle=”Mileage by Class”, + caption=”MPG Dataset”, + x=”Class”, + y=”Mileage”) >p Here, we are creating box plot with respect to attributes of class and cty. Dot Plot Dot plots are similar to scattered plots with only difference of dimension. In this section, we will be adding dot plot to the existing box plot to have better picture and clarity. The box plot can be created using the following command − > p <- ggplot(mpg, aes(manufacturer, cty)) + + geom_boxplot() + + theme(axis.text.x = element_text(angle=65, vjust=0.6)) > p The dot plot is created as mentioned below − > p + geom_dotplot(binaxis=”y”, + stackdir=”center”, + dotsize = .5 + ) Violin Plot Violin plot is also created in similar manner with only structure change of violins instead of box. The output is clearly mentioned below − > p <- ggplot(mpg, aes(class, cty)) > > p + geom_violin() Print Page Previous Next Advertisements ”;

ggplot2 – Bubble Plots & Count Charts

ggplot2 – Bubble Plots & Count Charts ”; Previous Next Bubble plots are nothing but bubble charts which is basically a scatter plot with a third numeric variable used for circle size. In this chapter, we will focus on creation of bar count plot and histogram count plots which is considered as replica of bubble plots. Following steps are used to create bubble plots and count charts with mentioned package − Understanding Dataset Load the respective package and the required dataset to create the bubble plots and count charts. > # Load ggplot > library(ggplot2) > > # Read in dataset > data(mpg) > head(mpg) # A tibble: 6 x 11 manufacturer model displ year cyl trans drv cty hwy fl class <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr> 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compa~ 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compa~ 3 audi a4 2 2008 4 manual(m6) f 20 31 p compa~ 4 audi a4 2 2008 4 auto(av) f 21 30 p compa~ 5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compa~ 6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compa~ The bar count plot can be created using the following command − > # A bar count plot > p <- ggplot(mpg, aes(x=factor(cyl)))+ + geom_bar(stat=”count”) > p Analysis with Histograms The histogram count plot can be created using the following command − > # A historgram count plot > ggplot(data=mpg, aes(x=hwy)) + + geom_histogram( col=”red”, + fill=”green”, + alpha = .2, + binwidth = 5) Bubble Charts Now let us create the most basic bubble plot with the required attributes of increasing the dimension of points mentioned in scattered plot. ggplot(mpg, aes(x=cty, y=hwy, size = pop)) +geom_point(alpha=0.7) The plot describes the nature of manufacturers which is included in legend format. The values represented include various dimensions of “hwy” attribute. Print Page Previous Next Advertisements ”;