Seaborn – Plotting Wide Form Data

Seaborn – Plotting Wide Form Data ”; Previous Next It is always preferable to use ‘long-from’ or ‘tidy’ datasets. But at times when we are left with no option rather than to use a ‘wide-form’ dataset, same functions can also be applied to “wide-form” data in a variety of formats, including Pandas Data Frames or two-dimensional NumPy arrays. These objects should be passed directly to the data parameter the x and y variables must be specified as strings Example import pandas as pd import seaborn as sb from matplotlib import pyplot as plt df = sb.load_dataset(”iris”) sb.boxplot(data = df, orient = “h”) plt.show() Output Additionally, these functions accept vectors of Pandas or NumPy objects rather than variables in a DataFrame. Example import pandas as pd import seaborn as sb from matplotlib import pyplot as plt df = sb.load_dataset(”iris”) sb.boxplot(data = df, orient = “h”) plt.show() Output The major advantage of using Seaborn for many developers in Python world is because it can take pandas DataFrame object as parameter. Print Page Previous Next Advertisements ”;

Seaborn – Facet Grid

Seaborn – Facet Grid ”; Previous Next A useful approach to explore medium-dimensional data, is by drawing multiple instances of the same plot on different subsets of your dataset. This technique is commonly called as “lattice”, or “trellis” plotting, and it is related to the idea of “small multiples”. To use these features, your data has to be in a Pandas DataFrame. Plotting Small Multiples of Data Subsets In the previous chapter, we have seen the FacetGrid example where FacetGrid class helps in visualizing distribution of one variable as well as the relationship between multiple variables separately within subsets of your dataset using multiple panels. A FacetGrid can be drawn with up to three dimensions − row, col, and hue. The first two have obvious correspondence with the resulting array of axes; think of the hue variable as a third dimension along a depth axis, where different levels are plotted with different colors. FacetGrid object takes a dataframe as input and the names of the variables that will form the row, column, or hue dimensions of the grid. The variables should be categorical and the data at each level of the variable will be used for a facet along that axis. Example import pandas as pd import seaborn as sb from matplotlib import pyplot as plt df = sb.load_dataset(”tips”) g = sb.FacetGrid(df, col = “time”) plt.show() Output In the above example, we have just initialized the facetgrid object which doesn’t draw anything on them. The main approach for visualizing data on this grid is with the FacetGrid.map() method. Let us look at the distribution of tips in each of these subsets, using a histogram. Example import pandas as pd import seaborn as sb from matplotlib import pyplot as plt df = sb.load_dataset(”tips”) g = sb.FacetGrid(df, col = “time”) g.map(plt.hist, “tip”) plt.show() Output The number of plots is more than one because of the parameter col. We discussed about col parameter in our previous chapters. To make a relational plot, pass the multiple variable names. Example import pandas as pd import seaborn as sb from matplotlib import pyplot as plt df = sb.load_dataset(”tips”) g = sb.FacetGrid(df, col = “sex”, hue = “smoker”) g.map(plt.scatter, “total_bill”, “tip”) plt.show() Output Print Page Previous Next Advertisements ”;

Seaborn – Useful Resources

Seaborn – Useful Resources ”; Previous Next The following resources contain additional information on seaborn. Please use them to get more in-depth knowledge on this. Useful Video Courses Data Visualization using MatPlotLib & Seaborn 12 Lectures 4 hours DATAhill Solutions Srinivas Reddy More Detail Python Seaborn Course 12 Lectures 2.5 hours DATAhill Solutions Srinivas Reddy More Detail Basics Data Science with Numpy, Pandas and Matplotlib 11 Lectures 2.5 hours Akbar Khan More Detail Artificial Intelligence Projects: Project-Based Learning Best Seller 139 Lectures 18.5 hours Learnkart Technology Pvt Ltd More Detail Data Visualization in Python Using Seaborn Library 20 Lectures 2 hours ADITYA More Detail Data Science Prerequisites – Numpy – Pandas- Seaborn 45 Lectures 4.5 hours ADITYA More Detail Print Page Previous Next Advertisements ”;

Multi Panel Categorical Plots

Seaborn – Multi Panel Categorical Plots ”; Previous Next Categorical data can we visualized using two plots, you can either use the functions pointplot(), or the higher-level function factorplot(). Factorplot Factorplot draws a categorical plot on a FacetGrid. Using ‘kind’ parameter we can choose the plot like boxplot, violinplot, barplot and stripplot. FacetGrid uses pointplot by default. Example import pandas as pd import seaborn as sb from matplotlib import pyplot as plt df = sb.load_dataset(”exercise”) sb.factorplot(x = “time”, y = pulse”, hue = “kind”,data = df); plt.show() Output We can use different plot to visualize the same data using the kind parameter. Example import pandas as pd import seaborn as sb from matplotlib import pyplot as plt df = sb.load_dataset(”exercise”) sb.factorplot(x = “time”, y = “pulse”, hue = “kind”, kind = ”violin”,data = df); plt.show() Output In factorplot, the data is plotted on a facet grid. What is Facet Grid? Facet grid forms a matrix of panels defined by row and column by dividing the variables. Due of panels, a single plot looks like multiple plots. It is very helpful to analyze all combinations in two discrete variables. Let us visualize the above the definition with an example Example import pandas as pd import seaborn as sb from matplotlib import pyplot as plt df = sb.load_dataset(”exercise”) sb.factorplot(x = “time”, y = “pulse”, hue = “kind”, kind = ”violin”, col = “diet”, data = df); plt.show() Output The advantage of using Facet is, we can input another variable into the plot. The above plot is divided into two plots based on a third variable called ‘diet’ using the ‘col’ parameter. We can make many column facets and align them with the rows of the grid − Example import pandas as pd import seaborn as sb from matplotlib import pyplot as plt df = sb.load_dataset(”titanic”) sb.factorplot(“alive”, col = “deck”, col_wrap = 3,data = df[df.deck.notnull()],kind = “count”) plt.show() output Print Page Previous Next Advertisements ”;

Seaborn – Discussion

Discuss Seaborn ”; Previous Next Seaborn is an open source, BSD-licensed Python library providing high level API for visualizing the data using Python programming language. Print Page Previous Next Advertisements ”;

Seaborn – Quick Guide

Seaborn – Quick Guide ”; Previous Next Seaborn – Introduction In the world of Analytics, the best way to get insights is by visualizing the data. Data can be visualized by representing it as plots which is easy to understand, explore and grasp. Such data helps in drawing the attention of key elements. To analyse a set of data using Python, we make use of Matplotlib, a widely implemented 2D plotting library. Likewise, Seaborn is a visualization library in Python. It is built on top of Matplotlib. Seaborn Vs Matplotlib It is summarized that if Matplotlib “tries to make easy things easy and hard things possible”, Seaborn tries to make a well-defined set of hard things easy too.” Seaborn helps resolve the two major problems faced by Matplotlib; the problems are − Default Matplotlib parameters Working with data frames As Seaborn compliments and extends Matplotlib, the learning curve is quite gradual. If you know Matplotlib, you are already half way through Seaborn. Important Features of Seaborn Seaborn is built on top of Python’s core visualization library Matplotlib. It is meant to serve as a complement, and not a replacement. However, Seaborn comes with some very important features. Let us see a few of them here. The features help in − Built in themes for styling matplotlib graphics Visualizing univariate and bivariate data Fitting in and visualizing linear regression models Plotting statistical time series data Seaborn works well with NumPy and Pandas data structures It comes with built in themes for styling Matplotlib graphics In most cases, you will still use Matplotlib for simple plotting. The knowledge of Matplotlib is recommended to tweak Seaborn’s default plots. Seaborn – Environment Setup In this chapter, we will discuss the environment setup for Seaborn. Let us begin with the installation and understand how to get started as we move ahead. Installing Seaborn and getting started In this section, we will understand the steps involved in the installation of Seaborn. Using Pip Installer To install the latest release of Seaborn, you can use pip − pip install seaborn For Windows, Linux & Mac using Anaconda Anaconda (from https://www.anaconda.com/ is a free Python distribution for SciPy stack. It is also available for Linux and Mac. It is also possible to install the released version using conda − conda install seaborn To install the development version of Seaborn directly from github https://github.com/mwaskom/seaborn” Dependencies Consider the following dependencies of Seaborn − Python 2.7 or 3.4+ numpy scipy pandas matplotlib Seaborn – Importing Datasets and Libraries In this chapter, we will discuss how to import Datasets and Libraries. Let us begin by understanding how to import libraries. Importing Libraries Let us start by importing Pandas, which is a great library for managing relational (table-format) datasets. Seaborn comes handy when dealing with DataFrames, which is most widely used data structure for data analysis. The following command will help you import Pandas − # Pandas for managing datasets import pandas as pd Now, let us import the Matplotlib library, which helps us customize our plots. # Matplotlib for additional customization from matplotlib import pyplot as plt We will import the Seaborn library with the following command − # Seaborn for plotting and styling import seaborn as sb Importing Datasets We have imported the required libraries. In this section, we will understand how to import the required datasets. Seaborn comes with a few important datasets in the library. When Seaborn is installed, the datasets download automatically. You can use any of these datasets for your learning. With the help of the following function you can load the required dataset load_dataset() Importing Data as Pandas DataFrame In this section, we will import a dataset. This dataset loads as Pandas DataFrame by default. If there is any function in the Pandas DataFrame, it works on this DataFrame. The following line of code will help you import the dataset − # Seaborn for plotting and styling import seaborn as sb df = sb.load_dataset(”tips”) print df.head() The above line of code will generate the following output − total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4 To view all the available data sets in the Seaborn library, you can use the following command with the get_dataset_names() function as shown below − import seaborn as sb print sb.get_dataset_names() The above line of code will return the list of datasets available as the following output [u”anscombe”, u”attention”, u”brain_networks”, u”car_crashes”, u”dots”, u”exercise”, u”flights”, u”fmri”, u”gammas”, u”iris”, u”planets”, u”tips”, u”titanic”] DataFrames store data in the form of rectangular grids by which the data can be over viewed easily. Each row of the rectangular grid contains values of an instance, and each column of the grid is a vector which holds data for a specific variable. This means that rows of a DataFrame do not need to contain, values of same data type, they can be numeric, character, logical, etc. DataFrames for Python come with the Pandas library, and they are defined as two-dimensional labeled data structures with potentially different types of columns. For more details on DataFrames, visit our tutorial on pandas. Seaborn – Figure Aesthetic Visualizing data is one step and further making the visualized data more pleasing is another step. Visualization plays a vital role in communicating quantitative insights to an audience to catch their attention. Aesthetics means a set of principles concerned with the nature and appreciation of beauty, especially in art. Visualization is an art of representing data in effective and easiest possible way. Matplotlib library highly supports customization, but knowing what settings to tweak to achieve an attractive and anticipated plot is what one should be aware of to make use of it. Unlike Matplotlib, Seaborn comes packed with customized themes and a high-level interface for customizing and controlling the look of Matplotlib

Seaborn – Function Reference

Seaborn Function Reference ”; Previous Next Seaborn is a popular visualization library in python. Unlike other visualization libraries in python Seaborn includes several predefined datasets. Statistical analysis is the process of determining how the variables in a dataset relate to one another and the relationships between them. The statistical relationship between the data points is visualized using relational graphs. Because visualization allows humans to detect trends and patterns in data it is an important part of the business. Seaborn is a popular visualization library in python. Unlike other visualization libraries in python, Seaborn includes several predefined datasets. The following are various functions available in the Seaborn library − Relational Plots Distribution Plots Categorial plots Regression plots Matrix Plots Multi plot grids Themeing Color Palettes Palette widgets Utility Functions Print Page Previous Next Advertisements ”;

Seaborn – Statistical Estimation

Seaborn – Statistical Estimation ”; Previous Next In most of the situations, we deal with estimations of the whole distribution of the data. But when it comes to central tendency estimation, we need a specific way to summarize the distribution. Mean and median are the very often used techniques to estimate the central tendency of the distribution. In all the plots that we learnt in the above section, we made the visualization of the whole distribution. Now, let us discuss regarding the plots with which we can estimate the central tendency of the distribution. Bar Plot The barplot() shows the relation between a categorical variable and a continuous variable. The data is represented in rectangular bars where the length the bar represents the proportion of the data in that category. Bar plot represents the estimate of central tendency. Let us use the ‘titanic’ dataset to learn bar plots. Example import pandas as pd import seaborn as sb from matplotlib import pyplot as plt df = sb.load_dataset(”titanic”) sb.barplot(x = “sex”, y = “survived”, hue = “class”, data = df) plt.show() Output In the above example, we can see that the average number of survivals of male and female in each class. From the plot we can understand that more number of females survived than males. In both males and females more number of survivals are from first class. A special case in barplot is to show the no of observations in each category rather than computing a statistic for a second variable. For this, we use countplot(). Example import pandas as pd import seaborn as sb from matplotlib import pyplot as plt df = sb.load_dataset(”titanic”) sb.countplot(x = ” class “, data = df, palette = “Blues”); plt.show() Output Plot says that, the number of passengers in the third class are higher than first and second class. Point Plots Point plots serve same as bar plots but in a different style. Rather than the full bar, the value of the estimate is represented by the point at a certain height on the other axis. Example import pandas as pd import seaborn as sb from matplotlib import pyplot as plt df = sb.load_dataset(”titanic”) sb.pointplot(x = “sex”, y = “survived”, hue = “class”, data = df) plt.show() Output Print Page Previous Next Advertisements ”;

Distribution of Observations

Seaborn – Distribution of Observations ”; Previous Next In categorical scatter plots which we dealt in the previous chapter, the approach becomes limited in the information it can provide about the distribution of values within each category. Now, going further, let us see what can facilitate us with performing comparison with in categories. Box Plots Boxplot is a convenient way to visualize the distribution of data through their quartiles. Box plots usually have vertical lines extending from the boxes which are termed as whiskers. These whiskers indicate variability outside the upper and lower quartiles, hence Box Plots are also termed as box-and-whisker plot and box-and-whisker diagram. Any Outliers in the data are plotted as individual points. Example import pandas as pd import seaborn as sb from matplotlib import pyplot as plt df = sb.load_dataset(”iris”) sb.swarmplot(x = “species”, y = “petal_length”, data = df) plt.show() Output The dots on the plot indicates the outlier. Violin Plots Violin Plots are a combination of the box plot with the kernel density estimates. So, these plots are easier to analyze and understand the distribution of the data. Let us use tips dataset called to learn more into violin plots. This dataset contains the information related to the tips given by the customers in a restaurant. Example import pandas as pd import seaborn as sb from matplotlib import pyplot as plt df = sb.load_dataset(”tips”) sb.violinplot(x = “day”, y = “total_bill”, data=df) plt.show() Output The quartile and whisker values from the boxplot are shown inside the violin. As the violin plot uses KDE, the wider portion of violin indicates the higher density and narrow region represents relatively lower density. The Inter-Quartile range in boxplot and higher density portion in kde fall in the same region of each category of violin plot. The above plot shows the distribution of total_bill on four days of the week. But, in addition to that, if we want to see how the distribution behaves with respect to sex, lets explore it in below example. Example import pandas as pd import seaborn as sb from matplotlib import pyplot as plt df = sb.load_dataset(”tips”) sb.violinplot(x = “day”, y = “total_bill”,hue = ”sex”, data = df) plt.show() Output Now we can clearly see the spending behavior between male and female. We can easily say that, men make more bill than women by looking at the plot. And, if the hue variable has only two classes, we can beautify the plot by splitting each violin into two instead of two violins on a given day. Either parts of the violin refer to each class in the hue variable. Example import pandas as pd import seaborn as sb from matplotlib import pyplot as plt df = sb.load_dataset(”tips”) sb.violinplot(x = “day”, y=”total_bill”,hue = ”sex”, data = df) plt.show() Output Print Page Previous Next Advertisements ”;

Seaborn – Home

Seaborn Tutorial PDF Version Quick Guide Resources Job Search Discussion Seaborn is an open source, BSD-licensed Python library providing high level API for visualizing the data using Python programming language. Audience This tutorial takes you through the basics and various functions of Seaborn. It is specifically useful for people working on data analysis. After completing this tutorial, you will find yourself at a moderate level of expertise from where you can take yourself to higher levels of expertise. Prerequisites You should have a basic understanding of computer programming terminologies. A basic understanding of Python and any of the programming languages is a plus. Seaborn library is built on top of Matplotlib. Having basic idea of Matplotlib will help you understand this tutorial in a better way. Print Page Previous Next Advertisements ”;