Python Processing Unstructured Data

Python – Processing Unstructured Data ”; Previous Next The data that is already present in a row and column format or which can be easily converted to rows and columns so that later it can fit nicely into a database is known as structured data. Examples are CSV, TXT, XLS files etc. These files have a delimiter and either fixed or variable width where the missing values are represented as blanks in between the delimiters. But sometimes we get data where the lines are not fixed width, or they are just HTML, image or pdf files. Such data is known as unstructured data. While the HTML file can be handled by processing the HTML tags, a feed from twitter or a plain text document from a news feed can without having a delimiter does not have tags to handle. In such scenario we use different in-built functions from various python libraries to process the file. Reading Data In the below example we take a text file and read the file segregating each of the lines in it. Next we can divide the output into further lines and words. The original file is a text file containing some paragraphs describing the python language. filename = ”pathinput.txt” with open(filename) as fn: # Read each line ln = fn.readline() # Keep count of lines lncnt = 1 while ln: print(“Line {}: {}”.format(lncnt, ln.strip())) ln = fn.readline() lncnt += 1 When we execute the above code, it produces the following result. Line 1: Python is an interpreted high-level programming language for general-purpose programming. Created by Guido van Rossum and first released in 1991, Python has a design philosophy that emphasizes code readability, notably using significant whitespace. It provides constructs that enable clear programming on both small and large scales. Line 2: Python features a dynamic type system and automatic memory management. It supports multiple programming paradigms, including object-oriented, imperative, functional and procedural, and has a large and comprehensive standard library. Line 3: Python interpreters are available for many operating systems. CPython, the reference implementation of Python, is open source software and has a community-based development model, as do nearly all of its variant implementations. CPython is managed by the non-profit Python Software Foundation. Counting Word Frequency We can count the frequency of the words in the file using the counter function as follows. from collections import Counter with open(r”pathinput2.txt”) as f: p = Counter(f.read().split()) print(p) When we execute the above code, it produces the following result. Counter({”and”: 3, ”Python”: 3, ”that”: 2, ”a”: 2, ”programming”: 2, ”code”: 1, ”1991,”: 1, ”is”: 1, ”programming.”: 1, ”dynamic”: 1, ”an”: 1, ”design”: 1, ”in”: 1, ”high-level”: 1, ”management.”: 1, ”features”: 1, ”readability,”: 1, ”van”: 1, ”both”: 1, ”for”: 1, ”Rossum”: 1, ”system”: 1, ”provides”: 1, ”memory”: 1, ”has”: 1, ”type”: 1, ”enable”: 1, ”Created”: 1, ”philosophy”: 1, ”constructs”: 1, ”emphasizes”: 1, ”general-purpose”: 1, ”notably”: 1, ”released”: 1, ”significant”: 1, ”Guido”: 1, ”using”: 1, ”interpreted”: 1, ”by”: 1, ”on”: 1, ”language”: 1, ”whitespace.”: 1, ”clear”: 1, ”It”: 1, ”large”: 1, ”small”: 1, ”automatic”: 1, ”scales.”: 1, ”first”: 1}) Print Page Previous Next Advertisements ”;

Python Date and Time

Python – Date and Time ”; Previous Next Often in data science we need analysis which is based on temporal values. Python can handle the various formats of date and time gracefully. The datetime library provides necessary methods and functions to handle the following scenarios. Date Time Representation Date Time Arithmetic Date Time Comparison We will study them one by one. Date Time Representation A date and its various parts are represented by using different datetime functions. Also, there are format specifiers which play a role in displaying the alphabetical parts of a date like name of the month or week day. The following code shows today”s date and various parts of the date. import datetime print ”The Date Today is :”, datetime.datetime.today() date_today = datetime.date.today() print date_today print ”This Year :”, date_today.year print ”This Month :”, date_today.month print ”Month Name:”,date_today.strftime(”%B”) print ”This Week Day :”, date_today.day print ”Week Day Name:”,date_today.strftime(”%A”) When we execute the above code, it produces the following result. The Date Today is : 2018-04-22 15:38:35.835000 2018-04-22 This Year : 2018 This Month : 4 Month Name: April This Week Day : 22 Week Day Name: Sunday Date Time Arithmetic For calculations involving dates we store the various dates into variables and apply the relevant mathematical operator to these variables. import datetime #Capture the First Date day1 = datetime.date(2018, 2, 12) print ”day1:”, day1.ctime() # Capture the Second Date day2 = datetime.date(2017, 8, 18) print ”day2:”, day2.ctime() # Find the difference between the dates print ”Number of Days:”, day1-day2 date_today = datetime.date.today() # Create a delta of Four Days no_of_days = datetime.timedelta(days=4) # Use Delta for Past Date before_four_days = date_today – no_of_days print ”Before Four Days:”, before_four_days # Use Delta for future Date after_four_days = date_today + no_of_days print ”After Four Days:”, after_four_days When we execute the above code, it produces the following result. day1: Mon Feb 12 00:00:00 2018 day2: Fri Aug 18 00:00:00 2017 Number of Days: 178 days, 0:00:00 Before Four Days: 2018-04-18 After Four Days: 2018-04-26 Date Time Comparison Date and time are compared using logical operators. But we must be careful in comparing the right parts of the dates with each other. In the below examples we take the future and past dates and compare them using the python if clause along with logical operators. import datetime date_today = datetime.date.today() print ”Today is: ”, date_today # Create a delta of Four Days no_of_days = datetime.timedelta(days=4) # Use Delta for Past Date before_four_days = date_today – no_of_days print ”Before Four Days:”, before_four_days after_four_days = date_today + no_of_days date1 = datetime.date(2018,4,4) print ”date1:”,date1 if date1 == before_four_days : print ”Same Dates” if date_today > date1: print ”Past Date” if date1 < after_four_days: print ”Future Date” When we execute the above code, it produces the following result. Today is: 2018-04-22 Before Four Days: 2018-04-18 date1: 2018-04-04 Past Date Future Date Print Page Previous Next Advertisements ”;

Python Measuring Variance

Python – Measuring Variance ”; Previous Next In statistics, variance is a measure of how far a value in a data set lies from the mean value. In other words, it indicates how dispersed the values are. It is measured by using standard deviation. The other method commonly used is skewness. Both of these are calculated by using functions available in pandas library. Measuring Standard Deviation Standard deviation is square root of variance. variance is the average of squared difference of values in a data set from the mean value. In python we calculate this value by using the function std() from pandas library. import pandas as pd #Create a Dictionary of series d = {”Name”:pd.Series([”Tom”,”James”,”Ricky”,”Vin”,”Steve”,”Smith”,”Jack”, ”Lee”,”Chanchal”,”Gasper”,”Naviya”,”Andres”]), ”Age”:pd.Series([25,26,25,23,30,25,23,34,40,30,25,46]), ”Rating”:pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])} #Create a DataFrame df = pd.DataFrame(d) # Calculate the standard deviation print df.std() Its output is as follows − Age 7.265527 Rating 0.661628 dtype: float64 Measuring Skewness It used to determine whether the data is symmetric or skewed. If the index is between -1 and 1, then the distribution is symmetric. If the index is no more than -1 then it is skewed to the left and if it is at least 1, then it is skewed to the right import pandas as pd #Create a Dictionary of series d = {”Name”:pd.Series([”Tom”,”James”,”Ricky”,”Vin”,”Steve”,”Smith”,”Jack”, ”Lee”,”Chanchal”,”Gasper”,”Naviya”,”Andres”]), ”Age”:pd.Series([25,26,25,23,30,25,23,34,40,30,25,46]), ”Rating”:pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])} #Create a DataFrame df = pd.DataFrame(d) print df.skew() Its output is as follows − Age 1.443490 Rating -0.153629 dtype: float64 So the distribution of age rating is symmetric while the distribution of age is skewed to the right. Print Page Previous Next Advertisements ”;

Python Data Science – Numpy

Python Data Science – NumPy ”; Previous Next What is NumPy? NumPy is a Python package which stands for ”Numerical Python”. It is a library consisting of multidimensional array objects and a collection of routines for processing of array. Operations using NumPy Using NumPy, a developer can perform the following operations − Mathematical and logical operations on arrays. Fourier transforms and routines for shape manipulation. Operations related to linear algebra. NumPy has in-built functions for linear algebra and random number generation. NumPy – A Replacement for MatLab NumPy is often used along with packages like SciPy (Scientific Python) and Mat−plotlib (plotting library). This combination is widely used as a replacement for MatLab, a popular platform for technical computing. However, Python alternative to MatLab is now seen as a more modern and complete programming language. It is open source, which is an added advantage of NumPy. ndarray Object The most important object defined in NumPy is an N-dimensional array type called ndarray. It describes the collection of items of the same type. Items in the collection can be accessed using a zero-based index. Every item in an ndarray takes the same size of block in the memory. Each element in ndarray is an object of data-type object (called dtype). Any item extracted from ndarray object (by slicing) is represented by a Python object of one of array scalar types. We will see lots of examples on using NumPy library of python in Data science work in the next chapters. Print Page Previous Next Advertisements ”;

Python Data Science – Getting Started

Data Science Python – Getting Started ”; Previous Next What is Data Science ? Data science is the process of deriving knowledge and insights from a huge and diverse set of data through organizing, processing and analysing the data. It involves many different disciplines like mathematical and statistical modelling, extracting data from it source and applying data visualization techniques. Often it also involves handling big data technologies to gather both structured and unstructured data. Below we will see some example scenarios where Data science is used. Recommendation systems As online shopping becomes more prevalent, the e-commerce platforms are able to capture users shopping preferences as well as the performance of various products in the market. This leads to creation of recommendation systems which create models predicting the shoppers needs and show the products the shopper is most likely to buy. Financial Risk management The financial risk involving loans and credits are better analysed by using the customers past spend habits, past defaults, other financial commitments and many socio-economic indicators. These data is gathered from various sources in different formats. Organising them together and getting insight into customers profile needs the help of Data science. The outcome is minimizing loss for the financial organization by avoiding bad debt. Improvement in Health Care services The health care industry deals with a variety of data which can be classified into technical data, financial data, patient information, drug information and legal rules. All this data need to be analysed in a coordinated manner to produce insights that will save cost both for the health care provider and care receiver while remaining legally compliant. Computer Vision The advancement in recognizing an image by a computer involves processing large sets of image data from multiple objects of same category. For example, Face recognition. These data sets are modelled, and algorithms are created to apply the model to newer images to get a satisfactory result. Processing of these huge data sets and creation of models need various tools used in Data science. Efficient Management of Energy As the demand for energy consumption soars, the energy producing companies need to manage the various phases of the energy production and distribution more efficiently. This involves optimizing the production methods, the storage and distribution mechanisms as well as studying the customers consumption patterns. Linking the data from all these sources and deriving insight seems a daunting task. This is made easier by using the tools of data science. Python in Data Science The programming requirements of data science demands a very versatile yet flexible language which is simple to write the code but can handle highly complex mathematical processing. Python is most suited for such requirements as it has already established itself both as a language for general computing as well as scientific computing. More over it is being continuously upgraded in form of new addition to its plethora of libraries aimed at different programming requirements. Below we will discuss such features of python which makes it the preferred language for data science. A simple and easy to learn language which achieves result in fewer lines of code than other similar languages like R. Its simplicity also makes it robust to handle complex scenarios with minimal code and much less confusion on the general flow of the program. It is cross platform, so the same code works in multiple environments without needing any change. That makes it perfect to be used in a multi-environment setup easily. It executes faster than other similar languages used for data analysis like R and MATLAB. Its excellent memory management capability, especially garbage collection makes it versatile in gracefully managing very large volume of data transformation, slicing, dicing and visualization. Most importantly Python has got a very large collection of libraries which serve as special purpose analysis tools. For example – the NumPy package deals with scientific computing and its array needs much less memory than the conventional python list for managing numeric data. And the number of such packages is continuously growing. Python has packages which can directly use the code from other languages like Java or C. This helps in optimizing the code performance by using existing code of other languages, whenever it gives a better result. In the subsequent chapters we will see how we can leverage these features of python to accomplish all the tasks needed in the different areas of Data Science. Print Page Previous Next Advertisements ”;

Python Data Science – Matplotlib

Python Data Science – Matplotlib ”; Previous Next What is Matplotlib? Matplotlib is a python library used to create 2D graphs and plots by using python scripts. It has a module named pyplot which makes things easy for plotting by providing feature to control line styles, font properties, formatting axes etc. It supports a very wide variety of graphs and plots namely – histogram, bar charts, power spectra, error charts etc. It is used along with NumPy to provide an environment that is an effective open source alternative for MatLab. It can also be used with graphics toolkits like PyQt and wxPython. Conventionally, the package is imported into the Python script by adding the following statement − from matplotlib import pyplot as plt Matplotlib Example The following script produces the sine wave plot using matplotlib. Example import numpy as np import matplotlib.pyplot as plt # Compute the x and y coordinates for points on a sine curve x = np.arange(0, 3 * np.pi, 0.1) y = np.sin(x) plt.title(“sine wave form”) # Plot the points using matplotlib plt.plot(x, y) plt.show() Its output is as follows − We will see lots of examples on using Matplotlib library of python in Data science work in the next chapters. Print Page Previous Next Advertisements ”;

Python Data Science – Environment Setup

Python Data Science – Environment Setup ”; Previous Next To successfully create and run the example code in this tutorial we will need an environment set up which will have both general-purpose python as well as the special packages required for Data science. We will first look as installing the general-purpose python which can be python 2 or python 3. But we will prefer python 2 for this tutorial mainly because of its maturity and wider support of external packages. Getting Python The most up-to-date and current source code, binaries, documentation, news, etc., is available on the official website of Python https://www.python.org/ You can download Python documentation from https://www.python.org/doc/. The documentation is available in HTML, PDF, and PostScript formats. Installing Python Python distribution is available for a wide variety of platforms. You need to download only the binary code applicable for your platform and install Python. If the binary code for your platform is not available, you need a C compiler to compile the source code manually. Compiling the source code offers more flexibility in terms of choice of features that you require in your installation. Here is a quick overview of installing Python on various platforms − Unix and Linux Installation Here are the simple steps to install Python on Unix/Linux machine. Open a Web browser and go to https://www.python.org/downloads/. Follow the link to download zipped source code available for Unix/Linux. Download and extract files. Editing the Modules/Setup file if you want to customize some options. run ./configure script make make install This installs Python at standard location /usr/local/bin and its libraries at /usr/local/lib/pythonXX where XX is the version of Python. Windows Installation Here are the steps to install Python on Windows machine. Open a Web browser and go to https://www.python.org/downloads/. Follow the link for the Windows installer python-XYZ.msi file where XYZ is the version you need to install. To use this installer python-XYZ.msi, the Windows system must support Microsoft Installer 2.0. Save the installer file to your local machine and then run it to find out if your machine supports MSI. Run the downloaded file. This brings up the Python install wizard, which is really easy to use. Just accept the default settings, wait until the install is finished, and you are done. Macintosh Installation Recent Macs come with Python installed, but it may be several years out of date. See http://www.python.org/download/mac/ for instructions on getting the current version along with extra tools to support development on the Mac. For older Mac OS”s before Mac OS X 10.3 (released in 2003), MacPython is available. Jack Jansen maintains it and you can have full access to the entire documentation at his website − http://www.cwi.nl/~jack/macpython.html. You can find complete installation details for Mac OS installation. Setting up PATH Programs and other executable files can be in many directories, so operating systems provide a search path that lists the directories that the OS searches for executables. The path is stored in an environment variable, which is a named string maintained by the operating system. This variable contains information available to the command shell and other programs. The path variable is named as PATH in Unix or Path in Windows (Unix is case sensitive; Windows is not). In Mac OS, the installer handles the path details. To invoke the Python interpreter from any particular directory, you must add the Python directory to your path. Setting path at Unix/Linux To add the Python directory to the path for a particular session in Unix − In the csh shell − type setenv PATH “$PATH:/usr/local/bin/python” and press Enter. In the bash shell (Linux) − type export ATH=”$PATH:/usr/local/bin/python” and press Enter. In the sh or ksh shell − type PATH=”$PATH:/usr/local/bin/python” and press Enter. Note − /usr/local/bin/python is the path of the Python directory Setting path at Windows To add the Python directory to the path for a particular session in Windows − At the command prompt − type path %path%;C:Python and press Enter. Note − C:Python is the path of the Python directory Python Environment Variables Here are important environment variables, which can be recognized by Python − Sr.No. Variable & Description 1 PYTHONPATH It has a role similar to PATH. This variable tells the Python interpreter where to locate the module files imported into a program. It should include the Python source library directory and the directories containing Python source code. PYTHONPATH is sometimes preset by the Python installer. 2 PYTHONSTARTUP It contains the path of an initialization file containing Python source code. It is executed every time you start the interpreter. It is named as .pythonrc.py in Unix and it contains commands that load utilities or modify PYTHONPATH. 3 PYTHONCASEOK It is used in Windows to instruct Python to find the first case-insensitive match in an import statement. Set this variable to any value to activate it. 4 PYTHONHOME It is an alternative module search path. It is usually embedded in the PYTHONSTARTUP or PYTHONPATH directories to make switching module libraries easy. Running Python There are three different ways to start Python − Interactive Interpreter You can start Python from Unix, DOS, or any other system that provides you a command-line interpreter or shell window. Enter python the command line. Start coding right away in the interactive interpreter. $python # Unix/Linux or python% # Unix/Linux or C:> python # Windows/DOS Here is the list of all the available command line options − Sr.No. Option & Description 1 -d It provides debug output. 2 -O It generates optimized bytecode (resulting in .pyo files). 3 -S Do not run import site to look for Python paths on startup. 4 -v verbose output (detailed trace on import statements). 5 -X disable class-based built-in exceptions (just use strings); obsolete starting with version 1.6. 6 -c cmd run Python script sent in as cmd string 7 file run Python script from given file Script from the Command-line A Python script can be executed at command line by invoking the interpreter on your application, as in

Python Processing CSV Data

Python – Processing CSV Data ”; Previous Next Reading data from CSV(comma separated values) is a fundamental necessity in Data Science. Often, we get data from various sources which can get exported to CSV format so that they can be used by other systems. The Panadas library provides features using which we can read the CSV file in full as well as in parts for only a selected group of columns and rows. Input as CSV File The csv file is a text file in which the values in the columns are separated by a comma. Let”s consider the following data present in the file named input.csv. You can create this file using windows notepad by copying and pasting this data. Save the file as input.csv using the save As All files(*.*) option in notepad. id,name,salary,start_date,dept 1,Rick,623.3,2012-01-01,IT 2,Dan,515.2,2013-09-23,Operations 3,Tusar,611,2014-11-15,IT 4,Ryan,729,2014-05-11,HR 5,Gary,843.25,2015-03-27,Finance 6,Rasmi,578,2013-05-21,IT 7,Pranab,632.8,2013-07-30,Operations 8,Guru,722.5,2014-06-17,Finance Reading a CSV File The read_csv function of the pandas library is used read the content of a CSV file into the python environment as a pandas DataFrame. The function can read the files from the OS by using proper path to the file. import pandas as pd data = pd.read_csv(”path/input.csv”) print (data) When we execute the above code, it produces the following result. Please note how an additional column starting with zero as a index has been created by the function. id name salary start_date dept 0 1 Rick 623.30 2012-01-01 IT 1 2 Dan 515.20 2013-09-23 Operations 2 3 Tusar 611.00 2014-11-15 IT 3 4 Ryan 729.00 2014-05-11 HR 4 5 Gary 843.25 2015-03-27 Finance 5 6 Rasmi 578.00 2013-05-21 IT 6 7 Pranab 632.80 2013-07-30 Operations 7 8 Guru 722.50 2014-06-17 Finance Reading Specific Rows The read_csv function of the pandas library can also be used to read some specific rows for a given column. We slice the result from the read_csv function using the code shown below for first 5 rows for the column named salary. import pandas as pd data = pd.read_csv(”path/input.csv”) # Slice the result for first 5 rows print (data[0:5][”salary”]) When we execute the above code, it produces the following result. 0 623.30 1 515.20 2 611.00 3 729.00 4 843.25 Name: salary, dtype: float64 Reading Specific Columns The read_csv function of the pandas library can also be used to read some specific columns. We use the multi-axes indexing method called .loc() for this purpose. We choose to display the salary and name column for all the rows. import pandas as pd data = pd.read_csv(”path/input.csv”) # Use the multi-axes indexing funtion print (data.loc[:,[”salary”,”name”]]) When we execute the above code, it produces the following result. salary name 0 623.30 Rick 1 515.20 Dan 2 611.00 Tusar 3 729.00 Ryan 4 843.25 Gary 5 578.00 Rasmi 6 632.80 Pranab 7 722.50 Guru Reading Specific Columns and Rows The read_csv function of the pandas library can also be used to read some specific columns and specific rows. We use the multi-axes indexing method called .loc() for this purpose. We choose to display the salary and name column for some of the rows. import pandas as pd data = pd.read_csv(”path/input.csv”) # Use the multi-axes indexing funtion print (data.loc[[1,3,5],[”salary”,”name”]]) When we execute the above code, it produces the following result. salary name 1 515.2 Dan 3 729.0 Ryan 5 578.0 Rasmi Reading Specific Columns for a Range of Rows The read_csv function of the pandas library can also be used to read some specific columns and a range of rows. We use the multi-axes indexing method called .loc() for this purpose. We choose to display the salary and name column for some of the rows. import pandas as pd data = pd.read_csv(”path/input.csv”) # Use the multi-axes indexing funtion print (data.loc[2:6,[”salary”,”name”]]) When we execute the above code, it produces the following result. salary name 2 611.00 Tusar 3 729.00 Ryan 4 843.25 Gary 5 578.00 Rasmi 6 632.80 Pranab Print Page Previous Next Advertisements ”;

Python Data Science – Pandas

Python Data Science – Pandas ”; Previous Next What is Pandas? Pandas is an open-source Python Library used for high-performance data manipulation and data analysis using its powerful data structures. Python with pandas is in use in a variety of academic and commercial domains, including Finance, Economics, Statistics, Advertising, Web Analytics, and more. Using Pandas, we can accomplish five typical steps in the processing and analysis of data, regardless of the origin of data — load, organize, manipulate, model, and analyse the data. Below are the some of the important features of Pandas which is used specifically for Data processing and Data analysis work. Key Features of Pandas Fast and efficient DataFrame object with default and customized indexing. Tools for loading data into in-memory data objects from different file formats. Data alignment and integrated handling of missing data. Reshaping and pivoting of date sets. Label-based slicing, indexing and subsetting of large data sets. Columns from a data structure can be deleted or inserted. Group by data for aggregation and transformations. High performance merging and joining of data. Time Series functionality. Pandas deals with the following three data structures − Series DataFrame These data structures are built on top of Numpy array, making them fast and efficient. Dimension & Description The best way to think of these data structures is that the higher dimensional data structure is a container of its lower dimensional data structure. For example, DataFrame is a container of Series, Panel is a container of DataFrame. Data Structure Dimensions Description Series 1 1D labeled homogeneous array, size-immutable. Data Frames 2 General 2D labeled, size-mutable tabular structure with potentially heterogeneously typed columns. DataFrame is widely used and it is the most important data structures. Series Series is a one-dimensional array like structure with homogeneous data. For example, the following series is a collection of integers 10, 23, 56, … 10 23 56 17 52 61 73 90 26 72 Key Points of Series Homogeneous data Size Immutable Values of Data Mutable DataFrame DataFrame is a two-dimensional array with heterogeneous data. For example, Name Age Gender Rating Steve 32 Male 3.45 Lia 28 Female 4.6 Vin 45 Male 3.9 Katie 38 Female 2.78 The table represents the data of a sales team of an organization with their overall performance rating. The data is represented in rows and columns. Each column represents an attribute and each row represents a person. Data Type of Columns The data types of the four columns are as follows − Column Type Name String Age Integer Gender String Rating Float Key Points of Data Frame Heterogeneous data Size Mutable Data Mutable We will see lots of examples on using pandas library of python in Data science work in the next chapters. Print Page Previous Next Advertisements ”;

Python Data Science – SciPy

Python Data Science – SciPy ”; Previous Next What is SciPy? The SciPy library of Python is built to work with NumPy arrays and provides many user-friendly and efficient numerical practices such as routines for numerical integration and optimization. Together, they run on all popular operating systems, are quick to install and are free of charge. NumPy and SciPy are easy to use, but powerful enough to depend on by some of the world”s leading scientists and engineers. SciPy Sub-packages SciPy is organized into sub-packages covering different scientific computing domains. These are summarized in the following table − scipy.constants Physical and mathematical constants scipy.fftpack Fourier transform scipy.integrate Integration routines scipy.interpolate Interpolation scipy.io Data input and output scipy.linalg Linear algebra routines scipy.optimize Optimization scipy.signal Signal processing scipy.sparse Sparse matrices scipy.spatial Spatial data structures and algorithms scipy.special Any special mathematical functions scipy.stats Statistics Data Structure The basic data structure used by SciPy is a multidimensional array provided by the NumPy module. NumPy provides some functions for Linear Algebra, Fourier Transforms and Random Number Generation, but not with the generality of the equivalent functions in SciPy. We will see lots of examples on using SciPy library of python in Data science work in the next chapters. Print Page Previous Next Advertisements ”;