python Data Science Archives - Donotsad where can learn any thing work project and make money

Aug 09

Python Bernoulli Distribution

Python – Bernoulli Distribution ”; Previous Next The Bernoulli distribution is a special case of the Binomial distribution where a single experiment is conducted so that the number of observation is 1. So, the Bernoulli distribution therefore describes events having exactly two outcomes. We use various functions in numpy library to mathematically calculate the values for a bernoulli distribution. Histograms are created over which we plot the probability distribution curve. from scipy.stats import bernoulli import seaborn as sb data_bern = bernoulli.rvs(size=1000,p=0.6) ax = sb.distplot(data_bern, kde=True, color=”crimson”, hist_kws={“linewidth”: 25,”alpha”:1}) ax.set(xlabel=”Bernouli”, ylabel=”Frequency”) Its output is as follows − Print Page Previous Next Advertisements ”;

Aug 09

Python Graph Data

Python – Graph Data ”; Previous Next CSGraph stands for Compressed Sparse Graph, which focuses on Fast graph algorithms based on sparse matrix representations. Graph Representations To begin with, let us understand what a sparse graph is and how it helps in graph representations. What exactly is a sparse graph? A graph is just a collection of nodes, which have links between them. Graphs can represent nearly anything − social network connections, where each node is a person and is connected to acquaintances; images, where each node is a pixel and is connected to neighbouring pixels; points in a high-dimensional distribution, where each node is connected to its nearest neighbours and practically anything else you can imagine. One very efficient way to represent graph data is in a sparse matrix: let us call it G. The matrix G is of size N x N, and G[i, j] gives the value of the connection between node ‘i” and node ‘j’. A sparse graph contains mostly zeros − that is, most nodes have only a few connections. This property turns out to be true in most cases of interest. The creation of the sparse graph submodule was motivated by several algorithms used in scikit-learn that included the following − Isomap − A manifold learning algorithm, which requires finding the shortest paths in a graph. Hierarchical clustering − A clustering algorithm based on a minimum spanning tree. Spectral Decomposition − A projection algorithm based on sparse graph laplacians. As a concrete example, imagine that we would like to represent the following undirected graph − This graph has three nodes, where node 0 and 1 are connected by an edge of weight 2, and nodes 0 and 2 are connected by an edge of weight 1. We can construct the dense, masked and sparse representations as shown in the following example, keeping in mind that an undirected graph is represented by a symmetric matrix. G_dense = np.array([ [0, 2, 1], [2, 0, 0], [1, 0, 0] ]) G_masked = np.ma.masked_values(G_dense, 0) from scipy.sparse import csr_matrix G_sparse = csr_matrix(G_dense) print G_sparse.data The above program will generate the following output. array([2, 1, 2, 1]) This is identical to the previous graph, except nodes 0 and 2 are connected by an edge of zero weight. In this case, the dense representation above leads to ambiguities − how can non-edges be represented, if zero is a meaningful value. In this case, either a masked or a sparse representation must be used to eliminate the ambiguity. Let us consider the following example. from scipy.sparse.csgraph import csgraph_from_dense G2_data = np.array ([ [np.inf, 2, 0 ], [2, np.inf, np.inf], [0, np.inf, np.inf] ]) G2_sparse = csgraph_from_dense(G2_data, null_value=np.inf) print G2_sparse.data The above program will generate the following output. array([ 2., 0., 2., 0.]) Print Page Previous Next Advertisements ”;

Aug 09

Python Data Science – Home

Python for Data Science Tutorial Data is the new Oil. This statement shows how every modern IT system is driven by capturing, storing and analysing data for various needs. Be it about making decision for business, forecasting weather, studying protein structures in biology or designing a marketing campaign. All of these scenarios involve a multidisciplinary approach of using mathematical models, statistics, graphs, databases and of course the business or scientific logic behind the data analysis. So we need a programming language which can cater to all these diverse needs of data science. Python shines bright as one such language as it has numerous libraries and built in features which makes it easy to tackle the needs of Data science. In this tutorial we will cover these the various techniques used in data science using the Python programming language. Audience This tutorial is designed for Computer Science graduates as well as Software Professionals who are willing to learn data science in simple and easy steps using Python as a programming language. Prerequisites Before proceeding with this tutorial, you should have a basic knowledge of writing code in Python programming language, using any python IDE and execution of Python programs. If you are completely new to python then please refer our Python tutorial to get a sound understanding of the language. Execute Python Programs For most of the examples given in this tutorial you will find Try it option, so just make use of it and enjoy your learning. Try following example using Try it option available at the top right corner of the below sample code box #!/usr/bin/python print “Hello, Python!” Print Page Previous Next Advertisements ”;

Aug 09

Python Scatter Plots

Python – Scatter Plots ”; Previous Next Scatterplots show many points plotted in the Cartesian plane. Each point represents the values of two variables. One variable is chosen in the horizontal axis and another in the vertical axis. Drawing a Scatter Plot Scatter plot can be created using the DataFrame.plot.scatter() methods. import pandas as pd import numpy as np df = pd.DataFrame(np.random.rand(50, 4), columns=[”a”, ”b”, ”c”, ”d”]) df.plot.scatter(x=”a”, y=”b”) Its output is as follows − Print Page Previous Next Advertisements ”;

Aug 09

Python Linear Regression

Python – Linear Regression ”; Previous Next In Linear Regression these two variables are related through an equation, where exponent (power) of both these variables is 1. Mathematically a linear relationship represents a straight line when plotted as a graph. A non-linear relationship where the exponent of any variable is not equal to 1 creates a curve. The functions in Seaborn to find the linear regression relationship is regplot. The below example shows its use. import seaborn as sb from matplotlib import pyplot as plt df = sb.load_dataset(”tips”) sb.regplot(x = “total_bill”, y = “tip”, data = df) plt.show() Its output is as follows − Print Page Previous Next Advertisements ”;

Aug 09

Python Binomial Distribution

Python – Binomial Distribution ”; Previous Next The binomial distribution model deals with finding the probability of success of an event which has only two possible outcomes in a series of experiments. For example, tossing of a coin always gives a head or a tail. The probability of finding exactly 3 heads in tossing a coin repeatedly for 10 times is estimated during the binomial distribution. We use the seaborn python library which has in-built functions to create such probability distribution graphs. Also, the scipy package helps is creating the binomial distribution. from scipy.stats import binom import seaborn as sb binom.rvs(size=10,n=20,p=0.8) data_binom = binom.rvs(n=20,p=0.8,loc=0,size=1000) ax = sb.distplot(data_binom, kde=True, color=”blue”, hist_kws={“linewidth”: 25,”alpha”:1}) ax.set(xlabel=”Binomial”, ylabel=”Frequency”) Its output is as follows − Print Page Previous Next Advertisements ”;

Aug 09

Python Bubble Charts

Python – Bubble Charts ”; Previous Next Bubble charts display data as a cluster of circles. The required data to create bubble chart needs to have the xy coordinates, size of the bubble and the colour of the bubbles. The colours can be supplied by the library itself. Drawing a Bubble Chart Bubble chart can be created using the DataFrame.plot.scatter() methods. import matplotlib.pyplot as plt import numpy as np # create data x = np.random.rand(40) y = np.random.rand(40) z = np.random.rand(40) colors = np.random.rand(40) # use the scatter function plt.scatter(x, y, s=z*1000,c=colors) plt.show() Its output is as follows − Print Page Previous Next Advertisements ”;

Aug 09

Python Geographical Data

Python – Geographical Data ”; Previous Next Many open source python libraries now have been created to represent the geographical maps. They are highly customizable and offer a varierty of maps depicting areas in different shapes and colours. One such package is Cartopy. You can download and install this package in your local environment from Cartopy. You can find numerous examples in its gallery. In the below example we show a portion of the world map showing parts of Asia and Australia. You can adjust the values of the parameters in the method set_extent to locate different areas of world map. import matplotlib.pyplot as plt import cartopy.crs as ccrs fig = plt.figure(figsize=(15, 10)) ax = fig.add_subplot(1, 1, 1, projection=ccrs.PlateCarree()) # make the map global rather than have it zoom in to # the extents of any plotted data ax.set_extent((60, 150, 55, -25)) ax.stock_img() ax.coastlines() ax.tissot(facecolor=”purple”, alpha=0.8) plt.show() Its output is as follows − Print Page Previous Next Advertisements ”;

Aug 09

Python P-Value

Python – P-Value ”; Previous Next The p-value is about the strength of a hypothesis. We build hypothesis based on some statistical model and compare the model”s validity using p-value. One way to get the p-value is by using T-test. This is a two-sided test for the null hypothesis that the expected value (mean) of a sample of independent observations ‘a’ is equal to the given population mean, popmean. Let us consider the following example. from scipy import stats rvs = stats.norm.rvs(loc = 5, scale = 10, size = (50,2)) print stats.ttest_1samp(rvs,5.0) The above program will generate the following output. Ttest_1sampResult(statistic = array([-1.40184894, 2.70158009]), pvalue = array([ 0.16726344, 0.00945234])) Comparing two samples In the following examples, there are two samples, which can come either from the same or from different distribution, and we want to test whether these samples have the same statistical properties. ttest_ind − Calculates the T-test for the means of two independent samples of scores. This is a two-sided test for the null hypothesis that two independent samples have identical average (expected) values. This test assumes that the populations have identical variances by default. We can use this test, if we observe two independent samples from the same or different population. Let us consider the following example. from scipy import stats rvs1 = stats.norm.rvs(loc = 5,scale = 10,size = 500) rvs2 = stats.norm.rvs(loc = 5,scale = 10,size = 500) print stats.ttest_ind(rvs1,rvs2) The above program will generate the following output. Ttest_indResult(statistic = -0.67406312233650278, pvalue = 0.50042727502272966) You can test the same with a new array of the same length, but with a varied mean. Use a different value in loc and test the same. Print Page Previous Next Advertisements ”;

Aug 09

Python 3D Charts

Python – 3D Charts ”; Previous Next Python is also capable of creating 3d charts. It involves adding a subplot to an existing two-dimensional plot and assigning the projection parameter as 3d. Drawing a 3D Plot 3dPlot is drawn by mpl_toolkits.mplot3d to add a subplot to an existing 2d plot. from mpl_toolkits.mplot3d import axes3d import matplotlib.pyplot as plt chart = plt.figure() chart3d = chart.add_subplot(111, projection=”3d”) # Create some test data. X, Y, Z = axes3d.get_test_data(0.08) # Plot a wireframe. chart3d.plot_wireframe(X, Y, Z, color=”r”,rstride=15, cstride=10) plt.show() Its output is as follows − Print Page Previous Next Advertisements ”;