Python – Bubble Charts ”; Previous Next Bubble charts display data as a cluster of circles. The required data to create bubble chart needs to have the xy coordinates, size of the bubble and the colour of the bubbles. The colours can be supplied by the library itself. Drawing a Bubble Chart Bubble chart can be created using the DataFrame.plot.scatter() methods. import matplotlib.pyplot as plt import numpy as np # create data x = np.random.rand(40) y = np.random.rand(40) z = np.random.rand(40) colors = np.random.rand(40) # use the scatter function plt.scatter(x, y, s=z*1000,c=colors) plt.show() Its output is as follows − Print Page Previous Next Advertisements ”;
Category: Machine Learning
Python Geographical Data
Python – Geographical Data ”; Previous Next Many open source python libraries now have been created to represent the geographical maps. They are highly customizable and offer a varierty of maps depicting areas in different shapes and colours. One such package is Cartopy. You can download and install this package in your local environment from Cartopy. You can find numerous examples in its gallery. In the below example we show a portion of the world map showing parts of Asia and Australia. You can adjust the values of the parameters in the method set_extent to locate different areas of world map. import matplotlib.pyplot as plt import cartopy.crs as ccrs fig = plt.figure(figsize=(15, 10)) ax = fig.add_subplot(1, 1, 1, projection=ccrs.PlateCarree()) # make the map global rather than have it zoom in to # the extents of any plotted data ax.set_extent((60, 150, 55, -25)) ax.stock_img() ax.coastlines() ax.tissot(facecolor=”purple”, alpha=0.8) plt.show() Its output is as follows − Print Page Previous Next Advertisements ”;
Python P-Value
Python – P-Value ”; Previous Next The p-value is about the strength of a hypothesis. We build hypothesis based on some statistical model and compare the model”s validity using p-value. One way to get the p-value is by using T-test. This is a two-sided test for the null hypothesis that the expected value (mean) of a sample of independent observations ‘a’ is equal to the given population mean, popmean. Let us consider the following example. from scipy import stats rvs = stats.norm.rvs(loc = 5, scale = 10, size = (50,2)) print stats.ttest_1samp(rvs,5.0) The above program will generate the following output. Ttest_1sampResult(statistic = array([-1.40184894, 2.70158009]), pvalue = array([ 0.16726344, 0.00945234])) Comparing two samples In the following examples, there are two samples, which can come either from the same or from different distribution, and we want to test whether these samples have the same statistical properties. ttest_ind − Calculates the T-test for the means of two independent samples of scores. This is a two-sided test for the null hypothesis that two independent samples have identical average (expected) values. This test assumes that the populations have identical variances by default. We can use this test, if we observe two independent samples from the same or different population. Let us consider the following example. from scipy import stats rvs1 = stats.norm.rvs(loc = 5,scale = 10,size = 500) rvs2 = stats.norm.rvs(loc = 5,scale = 10,size = 500) print stats.ttest_ind(rvs1,rvs2) The above program will generate the following output. Ttest_indResult(statistic = -0.67406312233650278, pvalue = 0.50042727502272966) You can test the same with a new array of the same length, but with a varied mean. Use a different value in loc and test the same. Print Page Previous Next Advertisements ”;
Python 3D Charts
Python – 3D Charts ”; Previous Next Python is also capable of creating 3d charts. It involves adding a subplot to an existing two-dimensional plot and assigning the projection parameter as 3d. Drawing a 3D Plot 3dPlot is drawn by mpl_toolkits.mplot3d to add a subplot to an existing 2d plot. from mpl_toolkits.mplot3d import axes3d import matplotlib.pyplot as plt chart = plt.figure() chart3d = chart.add_subplot(111, projection=”3d”) # Create some test data. X, Y, Z = axes3d.get_test_data(0.08) # Plot a wireframe. chart3d.plot_wireframe(X, Y, Z, color=”r”,rstride=15, cstride=10) plt.show() Its output is as follows − Print Page Previous Next Advertisements ”;
Python Chart Styling
Python – Chart Styling ”; Previous Next The charts created in python can have further styling by using some appropriate methods from the libraries used for charting. In this lesson we will see the implementation of Annotation, legends and chart background. We will continue to use the code from the last chapter and modify it to add these styles to the chart. Adding Annotations Many times, we need to annotate the chart by highlighting the specific locations of the chart. In the below example we indicate the sharp change in values in the chart by adding annotations at those points. import numpy as np from matplotlib import pyplot as plt x = np.arange(0,10) y = x ^ 2 z = x ^ 3 t = x ^ 4 # Labeling the Axes and Title plt.title(“Graph Drawing”) plt.xlabel(“Time”) plt.ylabel(“Distance”) plt.plot(x,y) #Annotate plt.annotate(xy=[2,1], s=”Second Entry”) plt.annotate(xy=[4,6], s=”Third Entry”) Its output is as follows − Adding Legends We sometimes need a chart with multiple lines being plotted. Use of legend represents the meaning associated with each line. In the below chart we have 3 lines with appropriate legends. import numpy as np from matplotlib import pyplot as plt x = np.arange(0,10) y = x ^ 2 z = x ^ 3 t = x ^ 4 # Labeling the Axes and Title plt.title(“Graph Drawing”) plt.xlabel(“Time”) plt.ylabel(“Distance”) plt.plot(x,y) #Annotate plt.annotate(xy=[2,1], s=”Second Entry”) plt.annotate(xy=[4,6], s=”Third Entry”) # Adding Legends plt.plot(x,z) plt.plot(x,t) plt.legend([”Race1”, ”Race2”,”Race3”], loc=4) Its output is as follows − Chart presentation Style We can modify the presentation style of the chart by using different methods from the style package. import numpy as np from matplotlib import pyplot as plt x = np.arange(0,10) y = x ^ 2 z = x ^ 3 t = x ^ 4 # Labeling the Axes and Title plt.title(“Graph Drawing”) plt.xlabel(“Time”) plt.ylabel(“Distance”) plt.plot(x,y) #Annotate plt.annotate(xy=[2,1], s=”Second Entry”) plt.annotate(xy=[4,6], s=”Third Entry”) # Adding Legends plt.plot(x,z) plt.plot(x,t) plt.legend([”Race1”, ”Race2”,”Race3”], loc=4) #Style the background plt.style.use(”fast”) plt.plot(x,z) Its output is as follows − Print Page Previous Next Advertisements ”;
Python word tokenization
Python – Word Tokenization ”; Previous Next Word tokenization is the process of splitting a large sample of text into words. This is a requirement in natural language processing tasks where each word needs to be captured and subjected to further analysis like classifying and counting them for a particular sentiment etc. The Natural Language Tool kit(NLTK) is a library used to achieve this. Install NLTK before proceeding with the python program for word tokenization. conda install -c anaconda nltk Next we use the word_tokenize method to split the paragraph into individual words. import nltk word_data = “It originated from the idea that there are readers who prefer learning new skills from the comforts of their drawing rooms” nltk_tokens = nltk.word_tokenize(word_data) print (nltk_tokens) When we execute the above code, it produces the following result. [”It”, ”originated”, ”from”, ”the”, ”idea”, ”that”, ”there”, ”are”, ”readers”, ”who”, ”prefer”, ”learning”, ”new”, ”skills”, ”from”, ”the”, ”comforts”, ”of”, ”their”, ”drawing”, ”rooms”] Tokenizing Sentences We can also tokenize the sentences in a paragraph like we tokenized the words. We use the method sent_tokenize to achieve this. Below is an example. import nltk sentence_data = “Sun rises in the east. Sun sets in the west.” nltk_tokens = nltk.sent_tokenize(sentence_data) print (nltk_tokens) When we execute the above code, it produces the following result. [”Sun rises in the east.”, ”Sun sets in the west.”] Print Page Previous Next Advertisements ”;
Python Chart Properties
Python – Chart Properties ”; Previous Next Python has excellent libraries for data visualization. A combination of Pandas, numpy and matplotlib can help in creating in nearly all types of visualizations charts. In this chapter we will get started with looking at some simple chart and the various properties of the chart. Creating a Chart We use numpy library to create the required numbers to be mapped for creating the chart and the pyplot method in matplotlib to draws the actual chart. import numpy as np import matplotlib.pyplot as plt x = np.arange(0,10) y = x ^ 2 #Simple Plot plt.plot(x,y) Its output is as follows − Labling the Axes We can apply labels to the axes as well as a title for the chart using appropriate methods from the library as shown below. import numpy as np import matplotlib.pyplot as plt x = np.arange(0,10) y = x ^ 2 #Labeling the Axes and Title plt.title(“Graph Drawing”) plt.xlabel(“Time”) plt.ylabel(“Distance”) #Simple Plot plt.plot(x,y) Its output is as follows − Formatting Line type and Colour The style as well as colour for the line in the chart can be specified using appropriate methods from the library as shown below. import numpy as np import matplotlib.pyplot as plt x = np.arange(0,10) y = x ^ 2 #Labeling the Axes and Title plt.title(“Graph Drawing”) plt.xlabel(“Time”) plt.ylabel(“Distance”) # Formatting the line colors plt.plot(x,y,”r”) # Formatting the line type plt.plot(x,y,”>”) Its output is as follows − Saving the Chart File The chart can be saved in different image file formats using appropriate methods from the library as shown below. import numpy as np import matplotlib.pyplot as plt x = np.arange(0,10) y = x ^ 2 #Labeling the Axes and Title plt.title(“Graph Drawing”) plt.xlabel(“Time”) plt.ylabel(“Distance”) # Formatting the line colors plt.plot(x,y,”r”) # Formatting the line type plt.plot(x,y,”>”) # save in pdf formats plt.savefig(”timevsdist.pdf”, format=”pdf”) The above code creates the pdf file in the default path of the python environment. Print Page Previous Next Advertisements ”;
Python Heat Maps
Python – Heat Maps ”; Previous Next A heatmap contains values representing various shades of the same colour for each value to be plotted. Usually the darker shades of the chart represent higher values than the lighter shade. For a very different value a completely different colour can also be used. The below example is a two-dimensional plot of values which are mapped to the indices and columns of the chart. from pandas import DataFrame import matplotlib.pyplot as plt data=[{2,3,4,1},{6,3,5,2},{6,3,5,4},{3,7,5,4},{2,8,1,5}] Index= [”I1”, ”I2”,”I3”,”I4”,”I5”] Cols = [”C1”, ”C2”, ”C3”,”C4”] df = DataFrame(data, index=Index, columns=Cols) plt.pcolor(df) plt.show() Its output is as follows − Print Page Previous Next Advertisements ”;
Python Box Plots
Python – Box Plots ”; Previous Next Boxplots are a measure of how well distributed the data in a data set is. It divides the data set into three quartiles. This graph represents the minimum, maximum, median, first quartile and third quartile in the data set. It is also useful in comparing the distribution of data across data sets by drawing boxplots for each of them. Drawing a Box Plot Boxplot can be drawn calling Series.box.plot() and DataFrame.box.plot(), or DataFrame.boxplot() to visualize the distribution of values within each column. For instance, here is a boxplot representing five trials of 10 observations of a uniform random variable on [0,1). import pandas as pd import numpy as np df = pd.DataFrame(np.random.rand(10, 5), columns=[”A”, ”B”, ”C”, ”D”, ”E”]) df.plot.box(grid=”True”) Its output is as follows − Print Page Previous Next Advertisements ”;
Python – Measuring Central Tendency ”; Previous Next Mathematically central tendency means measuring the center or distribution of location of values of a data set. It gives an idea of the average value of the data in the data set and also an indication of how widely the values are spread in the data set. That in turn helps in evaluating the chances of a new input fitting into the existing data set and hence probability of success. There are three main measures of central tendency which can be calculated using the methods in pandas python library. Mean – It is the Average value of the data which is a division of sum of the values with the number of values. Median – It is the middle value in distribution when the values are arranged in ascending or descending order. Mode – It is the most commonly occurring value in a distribution. Calculating Mean and Median The pandas functions can be directly used to calculate these values. import pandas as pd #Create a Dictionary of series d = {”Name”:pd.Series([”Tom”,”James”,”Ricky”,”Vin”,”Steve”,”Smith”,”Jack”, ”Lee”,”Chanchal”,”Gasper”,”Naviya”,”Andres”]), ”Age”:pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), ”Rating”:pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])} #Create a DataFrame df = pd.DataFrame(d) print “Mean Values in the Distribution” print df.mean() print “*******************************” print “Median Values in the Distribution” print df.median() Its output is as follows − Mean Values in the Distribution Age 31.833333 Rating 3.743333 dtype: float64 ******************************* Median Values in the Distribution Age 29.50 Rating 3.79 dtype: float64 Calculating Mode Mode may or may not be available in a distribution depending on whether the data is continous or whether there are values which has maximum frquency. We take a simple distribution below to find out the mode. Here we have a value which has maximum frequency in the distribution. import pandas as pd #Create a Dictionary of series d = {”Name”:pd.Series([”Tom”,”James”,”Ricky”,”Vin”,”Steve”,”Smith”,”Jack”, ”Lee”,”Chanchal”,”Gasper”,”Naviya”,”Andres”]), ”Age”:pd.Series([25,26,25,23,30,25,23,34,40,30,25,46])} #Create a DataFrame df = pd.DataFrame(d) print df.mode() Its output is as follows − Age Name 0 25.0 Andres 1 NaN Chanchal 2 NaN Gasper 3 NaN Jack 4 NaN James 5 NaN Lee 6 NaN Naviya 7 NaN Ricky 8 NaN Smith 9 NaN Steve 10 NaN Tom 11 NaN Vin Print Page Previous Next Advertisements ”;