Python Chart Styling

Python – Chart Styling ”; Previous Next The charts created in python can have further styling by using some appropriate methods from the libraries used for charting. In this lesson we will see the implementation of Annotation, legends and chart background. We will continue to use the code from the last chapter and modify it to add these styles to the chart. Adding Annotations Many times, we need to annotate the chart by highlighting the specific locations of the chart. In the below example we indicate the sharp change in values in the chart by adding annotations at those points. import numpy as np from matplotlib import pyplot as plt x = np.arange(0,10) y = x ^ 2 z = x ^ 3 t = x ^ 4 # Labeling the Axes and Title plt.title(“Graph Drawing”) plt.xlabel(“Time”) plt.ylabel(“Distance”) plt.plot(x,y) #Annotate plt.annotate(xy=[2,1], s=”Second Entry”) plt.annotate(xy=[4,6], s=”Third Entry”) Its output is as follows − Adding Legends We sometimes need a chart with multiple lines being plotted. Use of legend represents the meaning associated with each line. In the below chart we have 3 lines with appropriate legends. import numpy as np from matplotlib import pyplot as plt x = np.arange(0,10) y = x ^ 2 z = x ^ 3 t = x ^ 4 # Labeling the Axes and Title plt.title(“Graph Drawing”) plt.xlabel(“Time”) plt.ylabel(“Distance”) plt.plot(x,y) #Annotate plt.annotate(xy=[2,1], s=”Second Entry”) plt.annotate(xy=[4,6], s=”Third Entry”) # Adding Legends plt.plot(x,z) plt.plot(x,t) plt.legend([”Race1”, ”Race2”,”Race3”], loc=4) Its output is as follows − Chart presentation Style We can modify the presentation style of the chart by using different methods from the style package. import numpy as np from matplotlib import pyplot as plt x = np.arange(0,10) y = x ^ 2 z = x ^ 3 t = x ^ 4 # Labeling the Axes and Title plt.title(“Graph Drawing”) plt.xlabel(“Time”) plt.ylabel(“Distance”) plt.plot(x,y) #Annotate plt.annotate(xy=[2,1], s=”Second Entry”) plt.annotate(xy=[4,6], s=”Third Entry”) # Adding Legends plt.plot(x,z) plt.plot(x,t) plt.legend([”Race1”, ”Race2”,”Race3”], loc=4) #Style the background plt.style.use(”fast”) plt.plot(x,z) Its output is as follows − Print Page Previous Next Advertisements ”;

Python word tokenization

Python – Word Tokenization ”; Previous Next Word tokenization is the process of splitting a large sample of text into words. This is a requirement in natural language processing tasks where each word needs to be captured and subjected to further analysis like classifying and counting them for a particular sentiment etc. The Natural Language Tool kit(NLTK) is a library used to achieve this. Install NLTK before proceeding with the python program for word tokenization. conda install -c anaconda nltk Next we use the word_tokenize method to split the paragraph into individual words. import nltk word_data = “It originated from the idea that there are readers who prefer learning new skills from the comforts of their drawing rooms” nltk_tokens = nltk.word_tokenize(word_data) print (nltk_tokens) When we execute the above code, it produces the following result. [”It”, ”originated”, ”from”, ”the”, ”idea”, ”that”, ”there”, ”are”, ”readers”, ”who”, ”prefer”, ”learning”, ”new”, ”skills”, ”from”, ”the”, ”comforts”, ”of”, ”their”, ”drawing”, ”rooms”] Tokenizing Sentences We can also tokenize the sentences in a paragraph like we tokenized the words. We use the method sent_tokenize to achieve this. Below is an example. import nltk sentence_data = “Sun rises in the east. Sun sets in the west.” nltk_tokens = nltk.sent_tokenize(sentence_data) print (nltk_tokens) When we execute the above code, it produces the following result. [”Sun rises in the east.”, ”Sun sets in the west.”] Print Page Previous Next Advertisements ”;

Python Chart Properties

Python – Chart Properties ”; Previous Next Python has excellent libraries for data visualization. A combination of Pandas, numpy and matplotlib can help in creating in nearly all types of visualizations charts. In this chapter we will get started with looking at some simple chart and the various properties of the chart. Creating a Chart We use numpy library to create the required numbers to be mapped for creating the chart and the pyplot method in matplotlib to draws the actual chart. import numpy as np import matplotlib.pyplot as plt x = np.arange(0,10) y = x ^ 2 #Simple Plot plt.plot(x,y) Its output is as follows − Labling the Axes We can apply labels to the axes as well as a title for the chart using appropriate methods from the library as shown below. import numpy as np import matplotlib.pyplot as plt x = np.arange(0,10) y = x ^ 2 #Labeling the Axes and Title plt.title(“Graph Drawing”) plt.xlabel(“Time”) plt.ylabel(“Distance”) #Simple Plot plt.plot(x,y) Its output is as follows − Formatting Line type and Colour The style as well as colour for the line in the chart can be specified using appropriate methods from the library as shown below. import numpy as np import matplotlib.pyplot as plt x = np.arange(0,10) y = x ^ 2 #Labeling the Axes and Title plt.title(“Graph Drawing”) plt.xlabel(“Time”) plt.ylabel(“Distance”) # Formatting the line colors plt.plot(x,y,”r”) # Formatting the line type plt.plot(x,y,”>”) Its output is as follows − Saving the Chart File The chart can be saved in different image file formats using appropriate methods from the library as shown below. import numpy as np import matplotlib.pyplot as plt x = np.arange(0,10) y = x ^ 2 #Labeling the Axes and Title plt.title(“Graph Drawing”) plt.xlabel(“Time”) plt.ylabel(“Distance”) # Formatting the line colors plt.plot(x,y,”r”) # Formatting the line type plt.plot(x,y,”>”) # save in pdf formats plt.savefig(”timevsdist.pdf”, format=”pdf”) The above code creates the pdf file in the default path of the python environment. Print Page Previous Next Advertisements ”;

Python Heat Maps

Python – Heat Maps ”; Previous Next A heatmap contains values representing various shades of the same colour for each value to be plotted. Usually the darker shades of the chart represent higher values than the lighter shade. For a very different value a completely different colour can also be used. The below example is a two-dimensional plot of values which are mapped to the indices and columns of the chart. from pandas import DataFrame import matplotlib.pyplot as plt data=[{2,3,4,1},{6,3,5,2},{6,3,5,4},{3,7,5,4},{2,8,1,5}] Index= [”I1”, ”I2”,”I3”,”I4”,”I5”] Cols = [”C1”, ”C2”, ”C3”,”C4”] df = DataFrame(data, index=Index, columns=Cols) plt.pcolor(df) plt.show() Its output is as follows − Print Page Previous Next Advertisements ”;

Python Box Plots

Python – Box Plots ”; Previous Next Boxplots are a measure of how well distributed the data in a data set is. It divides the data set into three quartiles. This graph represents the minimum, maximum, median, first quartile and third quartile in the data set. It is also useful in comparing the distribution of data across data sets by drawing boxplots for each of them. Drawing a Box Plot Boxplot can be drawn calling Series.box.plot() and DataFrame.box.plot(), or DataFrame.boxplot() to visualize the distribution of values within each column. For instance, here is a boxplot representing five trials of 10 observations of a uniform random variable on [0,1). import pandas as pd import numpy as np df = pd.DataFrame(np.random.rand(10, 5), columns=[”A”, ”B”, ”C”, ”D”, ”E”]) df.plot.box(grid=”True”) Its output is as follows − Print Page Previous Next Advertisements ”;

Python Measuring Central Tendency

Python – Measuring Central Tendency ”; Previous Next Mathematically central tendency means measuring the center or distribution of location of values of a data set. It gives an idea of the average value of the data in the data set and also an indication of how widely the values are spread in the data set. That in turn helps in evaluating the chances of a new input fitting into the existing data set and hence probability of success. There are three main measures of central tendency which can be calculated using the methods in pandas python library. Mean – It is the Average value of the data which is a division of sum of the values with the number of values. Median – It is the middle value in distribution when the values are arranged in ascending or descending order. Mode – It is the most commonly occurring value in a distribution. Calculating Mean and Median The pandas functions can be directly used to calculate these values. import pandas as pd #Create a Dictionary of series d = {”Name”:pd.Series([”Tom”,”James”,”Ricky”,”Vin”,”Steve”,”Smith”,”Jack”, ”Lee”,”Chanchal”,”Gasper”,”Naviya”,”Andres”]), ”Age”:pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), ”Rating”:pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])} #Create a DataFrame df = pd.DataFrame(d) print “Mean Values in the Distribution” print df.mean() print “*******************************” print “Median Values in the Distribution” print df.median() Its output is as follows − Mean Values in the Distribution Age 31.833333 Rating 3.743333 dtype: float64 ******************************* Median Values in the Distribution Age 29.50 Rating 3.79 dtype: float64 Calculating Mode Mode may or may not be available in a distribution depending on whether the data is continous or whether there are values which has maximum frquency. We take a simple distribution below to find out the mode. Here we have a value which has maximum frequency in the distribution. import pandas as pd #Create a Dictionary of series d = {”Name”:pd.Series([”Tom”,”James”,”Ricky”,”Vin”,”Steve”,”Smith”,”Jack”, ”Lee”,”Chanchal”,”Gasper”,”Naviya”,”Andres”]), ”Age”:pd.Series([25,26,25,23,30,25,23,34,40,30,25,46])} #Create a DataFrame df = pd.DataFrame(d) print df.mode() Its output is as follows − Age Name 0 25.0 Andres 1 NaN Chanchal 2 NaN Gasper 3 NaN Jack 4 NaN James 5 NaN Lee 6 NaN Naviya 7 NaN Ricky 8 NaN Smith 9 NaN Steve 10 NaN Tom 11 NaN Vin Print Page Previous Next Advertisements ”;

Python Chi-square Test

Python – Chi-Square Test ”; Previous Next Chi-Square test is a statistical method to determine if two categorical variables have a significant correlation between them. Both those variables should be from same population and they should be categorical like − Yes/No, Male/Female, Red/Green etc. For example, we can build a data set with observations on people”s ice-cream buying pattern and try to correlate the gender of a person with the flavour of the ice-cream they prefer. If a correlation is found we can plan for appropriate stock of flavours by knowing the number of gender of people visiting. We use various functions in numpy library to carry out the chi-square test. from scipy import stats import numpy as np import matplotlib.pyplot as plt x = np.linspace(0, 10, 100) fig,ax = plt.subplots(1,1) linestyles = [”:”, ”–”, ”-.”, ”-”] deg_of_freedom = [1, 4, 7, 6] for df, ls in zip(deg_of_freedom, linestyles): ax.plot(x, stats.chi2.pdf(x, df), linestyle=ls) plt.xlim(0, 10) plt.ylim(0, 0.4) plt.xlabel(”Value”) plt.ylabel(”Frequency”) plt.title(”Chi-Square Distribution”) plt.legend() plt.show() Its output is as follows − Print Page Previous Next Advertisements ”;

Python Normal Distribution

Python – Normal Distribution ”; Previous Next The normal distribution is a form presenting data by arranging the probability distribution of each value in the data.Most values remain around the mean value making the arrangement symmetric. We use various functions in numpy library to mathematically calculate the values for a normal distribution. Histograms are created over which we plot the probability distribution curve. import matplotlib.pyplot as plt import numpy as np mu, sigma = 0.5, 0.1 s = np.random.normal(mu, sigma, 1000) # Create the bins and histogram count, bins, ignored = plt.hist(s, 20, normed=True) # Plot the distribution curve plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) * np.exp( – (bins – mu)**2 / (2 * sigma**2) ), linewidth=3, color=”y”) plt.show() Its output is as follows − Print Page Previous Next Advertisements ”;

Python Correlation

Python – Correlation ”; Previous Next Correlation refers to some statistical relationships involving dependence between two data sets. Simple examples of dependent phenomena include the correlation between the physical appearance of parents and their offspring, and the correlation between the price for a product and its supplied quantity. We take example of the iris data set available in seaborn python library. In it we try to establish the correlation between the length and the width of the sepals and petals of three species of iris flower. Based on the correlation found, a strong model could be created which easily distinguishes one species from another. import matplotlib.pyplot as plt import seaborn as sns df = sns.load_dataset(”iris”) #without regression sns.pairplot(df, kind=”scatter”) plt.show() Its output is as follows − Print Page Previous Next Advertisements ”;

Python Reading HTML Pages

Python – Reading HTML Pages ”; Previous Next library known as beautifulsoup. Using this library, we can search for the values of html tags and get specific data like title of the page and the list of headers in the page. Install Beautifulsoup Use the Anaconda package manager to install the required package and its dependent packages. conda install Beaustifulsoap Reading the HTML file In the below example we make a request to an url to be loaded into the python environment. Then use the html parser parameter to read the entire html file. Next, we print first few lines of the html page. import urllib2 from bs4 import BeautifulSoup # Fetch the html file response = urllib2.urlopen(”http://tutorialspoint.com/python/python_overview.htm”) html_doc = response.read() # Parse the html file soup = BeautifulSoup(html_doc, ”html.parser”) # Format the parsed html file strhtm = soup.prettify() # Print the first few characters print (strhtm[:225]) When we execute the above code, it produces the following result. <!DOCTYPE html> <!–[if IE 8]><html class=”ie ie8″> <![endif]–> <!–[if IE 9]><html class=”ie ie9″> <![endif]–> <!–[if gt IE 9]><!–> <html> <!–<![endif]–> <head> <!– Basic –> <meta charset=”utf-8″/> <title> Extracting Tag Value We can extract tag value from the first instance of the tag using the following code. import urllib2 from bs4 import BeautifulSoup response = urllib2.urlopen(”http://tutorialspoint.com/python/python_overview.htm”) html_doc = response.read() soup = BeautifulSoup(html_doc, ”html.parser”) print (soup.title) print(soup.title.string) print(soup.a.string) print(soup.b.string) When we execute the above code, it produces the following result. Python Overview Python Overview None Python is Interpreted Extracting All Tags We can extract tag value from all the instances of a tag using the following code. import urllib2 from bs4 import BeautifulSoup response = urllib2.urlopen(”http://tutorialspoint.com/python/python_overview.htm”) html_doc = response.read() soup = BeautifulSoup(html_doc, ”html.parser”) for x in soup.find_all(”b”): print(x.string) When we execute the above code, it produces the following result. Python is Interpreted Python is Interactive Python is Object-Oriented Python is a Beginner”s Language Easy-to-learn Easy-to-read Easy-to-maintain A broad standard library Interactive Mode Portable Extendable Databases GUI Programming Scalable Print Page Previous Next Advertisements ”;