Python Pandas – IO Tools ”; Previous Next The Pandas I/O API is a set of top level reader functions accessed like pd.read_csv() that generally return a Pandas object. The two workhorse functions for reading text files (or the flat files) are read_csv() and read_table(). They both use the same parsing code to intelligently convert tabular data into a DataFrame object − pandas.read_csv(filepath_or_buffer, sep=”,”, delimiter=None, header=”infer”, names=None, index_col=None, usecols=None pandas.read_csv(filepath_or_buffer, sep=”t”, delimiter=None, header=”infer”, names=None, index_col=None, usecols=None Here is how the csv file data looks like − S.No,Name,Age,City,Salary 1,Tom,28,Toronto,20000 2,Lee,32,HongKong,3000 3,Steven,43,Bay Area,8300 4,Ram,38,Hyderabad,3900 Save this data as temp.csv and conduct operations on it. S.No,Name,Age,City,Salary 1,Tom,28,Toronto,20000 2,Lee,32,HongKong,3000 3,Steven,43,Bay Area,8300 4,Ram,38,Hyderabad,3900 Save this data as temp.csv and conduct operations on it. read.csv read.csv reads data from the csv files and creates a DataFrame object. import pandas as pd df=pd.read_csv(“temp.csv”) print df Its output is as follows − S.No Name Age City Salary 0 1 Tom 28 Toronto 20000 1 2 Lee 32 HongKong 3000 2 3 Steven 43 Bay Area 8300 3 4 Ram 38 Hyderabad 3900 custom index This specifies a column in the csv file to customize the index using index_col. import pandas as pd df=pd.read_csv(“temp.csv”,index_col=[”S.No”]) print df Its output is as follows − S.No Name Age City Salary 1 Tom 28 Toronto 20000 2 Lee 32 HongKong 3000 3 Steven 43 Bay Area 8300 4 Ram 38 Hyderabad 3900 Converters dtype of the columns can be passed as a dict. import pandas as pd df = pd.read_csv(“temp.csv”, dtype={”Salary”: np.float64}) print df.dtypes Its output is as follows − S.No int64 Name object Age int64 City object Salary float64 dtype: object By default, the dtype of the Salary column is int, but the result shows it as float because we have explicitly casted the type. Thus, the data looks like float − S.No Name Age City Salary 0 1 Tom 28 Toronto 20000.0 1 2 Lee 32 HongKong 3000.0 2 3 Steven 43 Bay Area 8300.0 3 4 Ram 38 Hyderabad 3900.0 header_names Specify the names of the header using the names argument. import pandas as pd df=pd.read_csv(“temp.csv”, names=[”a”, ”b”, ”c”,”d”,”e”]) print df Its output is as follows − a b c d e 0 S.No Name Age City Salary 1 1 Tom 28 Toronto 20000 2 2 Lee 32 HongKong 3000 3 3 Steven 43 Bay Area 8300 4 4 Ram 38 Hyderabad 3900 Observe, the header names are appended with the custom names, but the header in the file has not been eliminated. Now, we use the header argument to remove that. If the header is in a row other than the first, pass the row number to header. This will skip the preceding rows. import pandas as pd df=pd.read_csv(“temp.csv”,names=[”a”,”b”,”c”,”d”,”e”],header=0) print df Its output is as follows − a b c d e 0 S.No Name Age City Salary 1 1 Tom 28 Toronto 20000 2 2 Lee 32 HongKong 3000 3 3 Steven 43 Bay Area 8300 4 4 Ram 38 Hyderabad 3900 skiprows skiprows skips the number of rows specified. import pandas as pd df=pd.read_csv(“temp.csv”, skiprows=2) print df Its output is as follows − 2 Lee 32 HongKong 3000 0 3 Steven 43 Bay Area 8300 1 4 Ram 38 Hyderabad 3900 Print Page Previous Next Advertisements ”;
Category: python Pandas
Python Pandas – Concatenation ”; Previous Next Pandas provides various facilities for easily combining together Series, DataFrame, and Panel objects. pd.concat(objs,axis=0,join=”outer”,join_axes=None, ignore_index=False) objs − This is a sequence or mapping of Series, DataFrame, or Panel objects. axis − {0, 1, …}, default 0. This is the axis to concatenate along. join − {‘inner’, ‘outer’}, default ‘outer’. How to handle indexes on other axis(es). Outer for union and inner for intersection. ignore_index − boolean, default False. If True, do not use the index values on the concatenation axis. The resulting axis will be labeled 0, …, n – 1. join_axes − This is the list of Index objects. Specific indexes to use for the other (n-1) axes instead of performing inner/outer set logic. Concatenating Objects The concat function does all of the heavy lifting of performing concatenation operations along an axis. Let us create different objects and do concatenation. Live Demo import pandas as pd one = pd.DataFrame({ ”Name”: [”Alex”, ”Amy”, ”Allen”, ”Alice”, ”Ayoung”], ”subject_id”:[”sub1”,”sub2”,”sub4”,”sub6”,”sub5”], ”Marks_scored”:[98,90,87,69,78]}, index=[1,2,3,4,5]) two = pd.DataFrame({ ”Name”: [”Billy”, ”Brian”, ”Bran”, ”Bryce”, ”Betty”], ”subject_id”:[”sub2”,”sub4”,”sub3”,”sub6”,”sub5”], ”Marks_scored”:[89,80,79,97,88]}, index=[1,2,3,4,5]) print pd.concat([one,two]) Its output is as follows − Marks_scored Name subject_id 1 98 Alex sub1 2 90 Amy sub2 3 87 Allen sub4 4 69 Alice sub6 5 78 Ayoung sub5 1 89 Billy sub2 2 80 Brian sub4 3 79 Bran sub3 4 97 Bryce sub6 5 88 Betty sub5 Suppose we wanted to associate specific keys with each of the pieces of the chopped up DataFrame. We can do this by using the keys argument − Live Demo import pandas as pd one = pd.DataFrame({ ”Name”: [”Alex”, ”Amy”, ”Allen”, ”Alice”, ”Ayoung”], ”subject_id”:[”sub1”,”sub2”,”sub4”,”sub6”,”sub5”], ”Marks_scored”:[98,90,87,69,78]}, index=[1,2,3,4,5]) two = pd.DataFrame({ ”Name”: [”Billy”, ”Brian”, ”Bran”, ”Bryce”, ”Betty”], ”subject_id”:[”sub2”,”sub4”,”sub3”,”sub6”,”sub5”], ”Marks_scored”:[89,80,79,97,88]}, index=[1,2,3,4,5]) print pd.concat([one,two],keys=[”x”,”y”]) Its output is as follows − x 1 98 Alex sub1 2 90 Amy sub2 3 87 Allen sub4 4 69 Alice sub6 5 78 Ayoung sub5 y 1 89 Billy sub2 2 80 Brian sub4 3 79 Bran sub3 4 97 Bryce sub6 5 88 Betty sub5 The index of the resultant is duplicated; each index is repeated. If the resultant object has to follow its own indexing, set ignore_index to True. Live Demo import pandas as pd one = pd.DataFrame({ ”Name”: [”Alex”, ”Amy”, ”Allen”, ”Alice”, ”Ayoung”], ”subject_id”:[”sub1”,”sub2”,”sub4”,”sub6”,”sub5”], ”Marks_scored”:[98,90,87,69,78]}, index=[1,2,3,4,5]) two = pd.DataFrame({ ”Name”: [”Billy”, ”Brian”, ”Bran”, ”Bryce”, ”Betty”], ”subject_id”:[”sub2”,”sub4”,”sub3”,”sub6”,”sub5”], ”Marks_scored”:[89,80,79,97,88]}, index=[1,2,3,4,5]) print pd.concat([one,two],keys=[”x”,”y”],ignore_index=True) Its output is as follows − Marks_scored Name subject_id 0 98 Alex sub1 1 90 Amy sub2 2 87 Allen sub4 3 69 Alice sub6 4 78 Ayoung sub5 5 89 Billy sub2 6 80 Brian sub4 7 79 Bran sub3 8 97 Bryce sub6 9 88 Betty sub5 Observe, the index changes completely and the Keys are also overridden. If two objects need to be added along axis=1, then the new columns will be appended. Live Demo import pandas as pd one = pd.DataFrame({ ”Name”: [”Alex”, ”Amy”, ”Allen”, ”Alice”, ”Ayoung”], ”subject_id”:[”sub1”,”sub2”,”sub4”,”sub6”,”sub5”], ”Marks_scored”:[98,90,87,69,78]}, index=[1,2,3,4,5]) two = pd.DataFrame({ ”Name”: [”Billy”, ”Brian”, ”Bran”, ”Bryce”, ”Betty”], ”subject_id”:[”sub2”,”sub4”,”sub3”,”sub6”,”sub5”], ”Marks_scored”:[89,80,79,97,88]}, index=[1,2,3,4,5]) print pd.concat([one,two],axis=1) Its output is as follows − Marks_scored Name subject_id Marks_scored Name subject_id 1 98 Alex sub1 89 Billy sub2 2 90 Amy sub2 80 Brian sub4 3 87 Allen sub4 79 Bran sub3 4 69 Alice sub6 97 Bryce sub6 5 78 Ayoung sub5 88 Betty sub5 Concatenating Using append A useful shortcut to concat are the append instance methods on Series and DataFrame. These methods actually predated concat. They concatenate along axis=0, namely the index − Live Demo import pandas as pd one = pd.DataFrame({ ”Name”: [”Alex”, ”Amy”, ”Allen”, ”Alice”, ”Ayoung”], ”subject_id”:[”sub1”,”sub2”,”sub4”,”sub6”,”sub5”], ”Marks_scored”:[98,90,87,69,78]}, index=[1,2,3,4,5]) two = pd.DataFrame({ ”Name”: [”Billy”, ”Brian”, ”Bran”, ”Bryce”, ”Betty”], ”subject_id”:[”sub2”,”sub4”,”sub3”,”sub6”,”sub5”], ”Marks_scored”:[89,80,79,97,88]}, index=[1,2,3,4,5]) print one.append(two) Its output is as follows − Marks_scored Name subject_id 1 98 Alex sub1 2 90 Amy sub2 3 87 Allen sub4 4 69 Alice sub6 5 78 Ayoung sub5 1 89 Billy sub2 2 80 Brian sub4 3 79 Bran sub3 4 97 Bryce sub6 5 88 Betty sub5 The append function can take multiple objects as well − Live Demo import pandas as pd one = pd.DataFrame({ ”Name”: [”Alex”, ”Amy”, ”Allen”, ”Alice”, ”Ayoung”], ”subject_id”:[”sub1”,”sub2”,”sub4”,”sub6”,”sub5”], ”Marks_scored”:[98,90,87,69,78]}, index=[1,2,3,4,5]) two = pd.DataFrame({ ”Name”: [”Billy”, ”Brian”, ”Bran”, ”Bryce”, ”Betty”], ”subject_id”:[”sub2”,”sub4”,”sub3”,”sub6”,”sub5”], ”Marks_scored”:[89,80,79,97,88]}, index=[1,2,3,4,5]) print one.append([two,one,two]) Its output is as follows − Marks_scored Name subject_id 1 98 Alex sub1 2 90 Amy sub2 3 87 Allen sub4 4 69 Alice sub6 5 78 Ayoung sub5 1 89 Billy sub2 2 80 Brian sub4 3 79 Bran sub3 4 97 Bryce sub6 5 88 Betty sub5 1 98 Alex sub1 2 90 Amy sub2 3 87 Allen sub4 4 69 Alice sub6 5 78 Ayoung sub5 1 89 Billy sub2 2 80 Brian sub4 3 79 Bran sub3 4 97 Bryce sub6 5 88 Betty sub5 Time Series Pandas provide a robust tool for working time with Time series data, especially in the financial sector. While working with time series data, we frequently come across the following − Generating sequence of time Convert the time series to different frequencies Pandas provides a relatively compact and self-contained set of tools for performing the above tasks. Get Current Time datetime.now() gives you the current date and time. Live Demo import pandas as pd print pd.datetime.now() Its output is as follows − 2017-05-11 06:10:13.393147 Create a TimeStamp Time-stamped data is the most basic type of timeseries data that associates values with points in time. For pandas objects, it means using the points in time. Let’s take an example − Live Demo import pandas as pd print pd.Timestamp(”2017-03-01”) Its output is as follows − 2017-03-01 00:00:00 It is also possible to convert integer or float epoch times. The default unit for these is nanoseconds (since these are how Timestamps are stored). However, often epochs are stored in another unit which can be specified. Let’s take another example Live Demo import pandas as pd print pd.Timestamp(1587687255,unit=”s”) Its output is as follows − 2020-04-24 00:14:15 Create a Range of Time Live Demo import pandas as pd print pd.date_range(“11:00”,
Python Pandas – Aggregations
Python Pandas – Aggregations ”; Previous Next Once the rolling, expanding and ewm objects are created, several methods are available to perform aggregations on data. Applying Aggregations on DataFrame Let us create a DataFrame and apply aggregations on it. Live Demo import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(10, 4), index = pd.date_range(”1/1/2000”, periods=10), columns = [”A”, ”B”, ”C”, ”D”]) print df r = df.rolling(window=3,min_periods=1) print r Its output is as follows − A B C D 2000-01-01 1.088512 -0.650942 -2.547450 -0.566858 2000-01-02 0.790670 -0.387854 -0.668132 0.267283 2000-01-03 -0.575523 -0.965025 0.060427 -2.179780 2000-01-04 1.669653 1.211759 -0.254695 1.429166 2000-01-05 0.100568 -0.236184 0.491646 -0.466081 2000-01-06 0.155172 0.992975 -1.205134 0.320958 2000-01-07 0.309468 -0.724053 -1.412446 0.627919 2000-01-08 0.099489 -1.028040 0.163206 -1.274331 2000-01-09 1.639500 -0.068443 0.714008 -0.565969 2000-01-10 0.326761 1.479841 0.664282 -1.361169 Rolling [window=3,min_periods=1,center=False,axis=0] We can aggregate by passing a function to the entire DataFrame, or select a column via the standard get item method. Apply Aggregation on a Whole Dataframe Live Demo import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(10, 4), index = pd.date_range(”1/1/2000”, periods=10), columns = [”A”, ”B”, ”C”, ”D”]) print df r = df.rolling(window=3,min_periods=1) print r.aggregate(np.sum) Its output is as follows − A B C D 2000-01-01 1.088512 -0.650942 -2.547450 -0.566858 2000-01-02 1.879182 -1.038796 -3.215581 -0.299575 2000-01-03 1.303660 -2.003821 -3.155154 -2.479355 2000-01-04 1.884801 -0.141119 -0.862400 -0.483331 2000-01-05 1.194699 0.010551 0.297378 -1.216695 2000-01-06 1.925393 1.968551 -0.968183 1.284044 2000-01-07 0.565208 0.032738 -2.125934 0.482797 2000-01-08 0.564129 -0.759118 -2.454374 -0.325454 2000-01-09 2.048458 -1.820537 -0.535232 -1.212381 2000-01-10 2.065750 0.383357 1.541496 -3.201469 A B C D 2000-01-01 1.088512 -0.650942 -2.547450 -0.566858 2000-01-02 1.879182 -1.038796 -3.215581 -0.299575 2000-01-03 1.303660 -2.003821 -3.155154 -2.479355 2000-01-04 1.884801 -0.141119 -0.862400 -0.483331 2000-01-05 1.194699 0.010551 0.297378 -1.216695 2000-01-06 1.925393 1.968551 -0.968183 1.284044 2000-01-07 0.565208 0.032738 -2.125934 0.482797 2000-01-08 0.564129 -0.759118 -2.454374 -0.325454 2000-01-09 2.048458 -1.820537 -0.535232 -1.212381 2000-01-10 2.065750 0.383357 1.541496 -3.201469 Apply Aggregation on a Single Column of a Dataframe Live Demo import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(10, 4), index = pd.date_range(”1/1/2000”, periods=10), columns = [”A”, ”B”, ”C”, ”D”]) print df r = df.rolling(window=3,min_periods=1) print r[”A”].aggregate(np.sum) Its output is as follows − A B C D 2000-01-01 1.088512 -0.650942 -2.547450 -0.566858 2000-01-02 1.879182 -1.038796 -3.215581 -0.299575 2000-01-03 1.303660 -2.003821 -3.155154 -2.479355 2000-01-04 1.884801 -0.141119 -0.862400 -0.483331 2000-01-05 1.194699 0.010551 0.297378 -1.216695 2000-01-06 1.925393 1.968551 -0.968183 1.284044 2000-01-07 0.565208 0.032738 -2.125934 0.482797 2000-01-08 0.564129 -0.759118 -2.454374 -0.325454 2000-01-09 2.048458 -1.820537 -0.535232 -1.212381 2000-01-10 2.065750 0.383357 1.541496 -3.201469 2000-01-01 1.088512 2000-01-02 1.879182 2000-01-03 1.303660 2000-01-04 1.884801 2000-01-05 1.194699 2000-01-06 1.925393 2000-01-07 0.565208 2000-01-08 0.564129 2000-01-09 2.048458 2000-01-10 2.065750 Freq: D, Name: A, dtype: float64 Apply Aggregation on Multiple Columns of a DataFrame Live Demo import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(10, 4), index = pd.date_range(”1/1/2000”, periods=10), columns = [”A”, ”B”, ”C”, ”D”]) print df r = df.rolling(window=3,min_periods=1) print r[[”A”,”B”]].aggregate(np.sum) Its output is as follows − A B C D 2000-01-01 1.088512 -0.650942 -2.547450 -0.566858 2000-01-02 1.879182 -1.038796 -3.215581 -0.299575 2000-01-03 1.303660 -2.003821 -3.155154 -2.479355 2000-01-04 1.884801 -0.141119 -0.862400 -0.483331 2000-01-05 1.194699 0.010551 0.297378 -1.216695 2000-01-06 1.925393 1.968551 -0.968183 1.284044 2000-01-07 0.565208 0.032738 -2.125934 0.482797 2000-01-08 0.564129 -0.759118 -2.454374 -0.325454 2000-01-09 2.048458 -1.820537 -0.535232 -1.212381 2000-01-10 2.065750 0.383357 1.541496 -3.201469 A B 2000-01-01 1.088512 -0.650942 2000-01-02 1.879182 -1.038796 2000-01-03 1.303660 -2.003821 2000-01-04 1.884801 -0.141119 2000-01-05 1.194699 0.010551 2000-01-06 1.925393 1.968551 2000-01-07 0.565208 0.032738 2000-01-08 0.564129 -0.759118 2000-01-09 2.048458 -1.820537 2000-01-10 2.065750 0.383357 Apply Multiple Functions on a Single Column of a DataFrame Live Demo import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(10, 4), index = pd.date_range(”1/1/2000”, periods=10), columns = [”A”, ”B”, ”C”, ”D”]) print df r = df.rolling(window=3,min_periods=1) print r[”A”].aggregate([np.sum,np.mean]) Its output is as follows − A B C D 2000-01-01 1.088512 -0.650942 -2.547450 -0.566858 2000-01-02 1.879182 -1.038796 -3.215581 -0.299575 2000-01-03 1.303660 -2.003821 -3.155154 -2.479355 2000-01-04 1.884801 -0.141119 -0.862400 -0.483331 2000-01-05 1.194699 0.010551 0.297378 -1.216695 2000-01-06 1.925393 1.968551 -0.968183 1.284044 2000-01-07 0.565208 0.032738 -2.125934 0.482797 2000-01-08 0.564129 -0.759118 -2.454374 -0.325454 2000-01-09 2.048458 -1.820537 -0.535232 -1.212381 2000-01-10 2.065750 0.383357 1.541496 -3.201469 sum mean 2000-01-01 1.088512 1.088512 2000-01-02 1.879182 0.939591 2000-01-03 1.303660 0.434553 2000-01-04 1.884801 0.628267 2000-01-05 1.194699 0.398233 2000-01-06 1.925393 0.641798 2000-01-07 0.565208 0.188403 2000-01-08 0.564129 0.188043 2000-01-09 2.048458 0.682819 2000-01-10 2.065750 0.688583 Apply Multiple Functions on Multiple Columns of a DataFrame Live Demo import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(10, 4), index = pd.date_range(”1/1/2000”, periods=10), columns = [”A”, ”B”, ”C”, ”D”]) print df r = df.rolling(window=3,min_periods=1) print r[[”A”,”B”]].aggregate([np.sum,np.mean]) Its output is as follows − A B C D 2000-01-01 1.088512 -0.650942 -2.547450 -0.566858 2000-01-02 1.879182 -1.038796 -3.215581 -0.299575 2000-01-03 1.303660 -2.003821 -3.155154 -2.479355 2000-01-04 1.884801 -0.141119 -0.862400 -0.483331 2000-01-05 1.194699 0.010551 0.297378 -1.216695 2000-01-06 1.925393 1.968551 -0.968183 1.284044 2000-01-07 0.565208 0.032738 -2.125934 0.482797 2000-01-08 0.564129 -0.759118 -2.454374 -0.325454 2000-01-09 2.048458 -1.820537 -0.535232 -1.212381 2000-01-10 2.065750 0.383357 1.541496 -3.201469 A B sum mean sum mean 2000-01-01 1.088512 1.088512 -0.650942 -0.650942 2000-01-02 1.879182 0.939591 -1.038796 -0.519398 2000-01-03 1.303660 0.434553 -2.003821 -0.667940 2000-01-04 1.884801 0.628267 -0.141119 -0.047040 2000-01-05 1.194699 0.398233 0.010551 0.003517 2000-01-06 1.925393 0.641798 1.968551 0.656184 2000-01-07 0.565208 0.188403 0.032738 0.010913 2000-01-08 0.564129 0.188043 -0.759118 -0.253039 2000-01-09 2.048458 0.682819 -1.820537 -0.606846 2000-01-10 2.065750 0.688583 0.383357 0.127786 Apply Different Functions to Different Columns of a Dataframe Live Demo import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(3, 4), index = pd.date_range(”1/1/2000”, periods=3), columns = [”A”, ”B”, ”C”, ”D”]) print df r = df.rolling(window=3,min_periods=1) print r.aggregate({”A” : np.sum,”B” : np.mean}) Its output is as follows − A B C D 2000-01-01 -1.575749 -1.018105 0.317797 0.545081 2000-01-02 -0.164917 -1.361068 0.258240 1.113091 2000-01-03 1.258111 1.037941 -0.047487 0.867371 A B 2000-01-01 -1.575749 -1.018105 2000-01-02 -1.740666 -1.189587 2000-01-03 -0.482555 -0.447078 Print Page Previous Next Advertisements ”;
Python Pandas – Useful Resources ”; Previous Next The following resources contain additional information on Python Pandas. Please use them to get more in-depth knowledge on this topic. Useful Video Courses Python Flask and SQLAlchemy ORM 22 Lectures 1.5 hours Jack Chan More Detail Python and Elixir Programming Bundle Course 81 Lectures 9.5 hours Pranjal Srivastava More Detail TKinter Course – Build Python GUI Apps 49 Lectures 4 hours John Elder More Detail A Beginner”s Guide to Python and Data Science 81 Lectures 8.5 hours Datai Team Academy More Detail Deploy Face Recognition Project With Python, Django, And Machine Learning Best Seller 93 Lectures 6.5 hours Srikanth Guskra More Detail Professional Python Web Development with Flask 80 Lectures 12 hours Stone River ELearning More Detail Print Page Previous Next Advertisements ”;
Python Pandas – Categorical Data ”; Previous Next Often in real-time, data includes the text columns, which are repetitive. Features like gender, country, and codes are always repetitive. These are the examples for categorical data. Categorical variables can take on only a limited, and usually fixed number of possible values. Besides the fixed length, categorical data might have an order but cannot perform numerical operation. Categorical are a Pandas data type. The categorical data type is useful in the following cases − A string variable consisting of only a few different values. Converting such a string variable to a categorical variable will save some memory. The lexical order of a variable is not the same as the logical order (“one”, “two”, “three”). By converting to a categorical and specifying an order on the categories, sorting and min/max will use the logical order instead of the lexical order. As a signal to other python libraries that this column should be treated as a categorical variable (e.g. to use suitable statistical methods or plot types). Object Creation Categorical object can be created in multiple ways. The different ways have been described below − category By specifying the dtype as “category” in pandas object creation. Live Demo import pandas as pd s = pd.Series([“a”,”b”,”c”,”a”], dtype=”category”) print s Its output is as follows − 0 a 1 b 2 c 3 a dtype: category Categories (3, object): [a, b, c] The number of elements passed to the series object is four, but the categories are only three. Observe the same in the output Categories. pd.Categorical Using the standard pandas Categorical constructor, we can create a category object. pandas.Categorical(values, categories, ordered) Let’s take an example − Live Demo import pandas as pd cat = pd.Categorical([”a”, ”b”, ”c”, ”a”, ”b”, ”c”]) print cat Its output is as follows − [a, b, c, a, b, c] Categories (3, object): [a, b, c] Let’s have another example − Live Demo import pandas as pd cat = cat=pd.Categorical([”a”,”b”,”c”,”a”,”b”,”c”,”d”], [”c”, ”b”, ”a”]) print cat Its output is as follows − [a, b, c, a, b, c, NaN] Categories (3, object): [c, b, a] Here, the second argument signifies the categories. Thus, any value which is not present in the categories will be treated as NaN. Now, take a look at the following example − Live Demo import pandas as pd cat = cat=pd.Categorical([”a”,”b”,”c”,”a”,”b”,”c”,”d”], [”c”, ”b”, ”a”],ordered=True) print cat Its output is as follows − [a, b, c, a, b, c, NaN] Categories (3, object): [c < b < a] Logically, the order means that, a is greater than b and b is greater than c. Description Using the .describe() command on the categorical data, we get similar output to a Series or DataFrame of the type string. Live Demo import pandas as pd import numpy as np cat = pd.Categorical([“a”, “c”, “c”, np.nan], categories=[“b”, “a”, “c”]) df = pd.DataFrame({“cat”:cat, “s”:[“a”, “c”, “c”, np.nan]}) print df.describe() print df[“cat”].describe() Its output is as follows − cat s count 3 3 unique 2 2 top c c freq 2 2 count 3 unique 2 top c freq 2 Name: cat, dtype: object Get the Properties of the Category obj.cat.categories command is used to get the categories of the object. Live Demo import pandas as pd import numpy as np s = pd.Categorical([“a”, “c”, “c”, np.nan], categories=[“b”, “a”, “c”]) print s.categories Its output is as follows − Index([u”b”, u”a”, u”c”], dtype=”object”) obj.ordered command is used to get the order of the object. Live Demo import pandas as pd import numpy as np cat = pd.Categorical([“a”, “c”, “c”, np.nan], categories=[“b”, “a”, “c”]) print cat.ordered Its output is as follows − False The function returned false because we haven”t specified any order. Renaming Categories Renaming categories is done by assigning new values to the series.cat.categoriesseries.cat.categories property. Live Demo import pandas as pd s = pd.Series([“a”,”b”,”c”,”a”], dtype=”category”) s.cat.categories = [“Group %s” % g for g in s.cat.categories] print s.cat.categories Its output is as follows − Index([u”Group a”, u”Group b”, u”Group c”], dtype=”object”) Initial categories [a,b,c] are updated by the s.cat.categories property of the object. Appending New Categories Using the Categorical.add.categories() method, new categories can be appended. Live Demo import pandas as pd s = pd.Series([“a”,”b”,”c”,”a”], dtype=”category”) s = s.cat.add_categories([4]) print s.cat.categories Its output is as follows − Index([u”a”, u”b”, u”c”, 4], dtype=”object”) Removing Categories Using the Categorical.remove_categories() method, unwanted categories can be removed. Live Demo import pandas as pd s = pd.Series([“a”,”b”,”c”,”a”], dtype=”category”) print (“Original object:”) print s print (“After removal:”) print s.cat.remove_categories(“a”) Its output is as follows − Original object: 0 a 1 b 2 c 3 a dtype: category Categories (3, object): [a, b, c] After removal: 0 NaN 1 b 2 c 3 NaN dtype: category Categories (2, object): [b, c] Comparison of Categorical Data Comparing categorical data with other objects is possible in three cases − comparing equality (== and !=) to a list-like object (list, Series, array, …) of the same length as the categorical data. all comparisons (==, !=, >, >=, <, and <=) of categorical data to another categorical Series, when ordered==True and the categories are the same. all comparisons of a categorical data to a scalar. Take a look at the following example − Live Demo import pandas as pd cat = pd.Series([1,2,3]).astype(“category”, categories=[1,2,3], ordered=True) cat1 = pd.Series([2,2,2]).astype(“category”, categories=[1,2,3], ordered=True) print cat>cat1 Its output is as follows − 0 False 1 False 2 True dtype: bool Print Page Previous Next Advertisements ”;
Python Pandas – Quick Guide
Python Pandas – Quick Guide ”; Previous Next Python Pandas – Introduction Pandas is an open-source Python Library providing high-performance data manipulation and analysis tool using its powerful data structures. The name Pandas is derived from the word Panel Data – an Econometrics from Multidimensional data. In 2008, developer Wes McKinney started developing pandas when in need of high performance, flexible tool for analysis of data. Prior to Pandas, Python was majorly used for data munging and preparation. It had very little contribution towards data analysis. Pandas solved this problem. Using Pandas, we can accomplish five typical steps in the processing and analysis of data, regardless of the origin of data — load, prepare, manipulate, model, and analyze. Python with Pandas is used in a wide range of fields including academic and commercial domains including finance, economics, Statistics, analytics, etc. Key Features of Pandas Fast and efficient DataFrame object with default and customized indexing. Tools for loading data into in-memory data objects from different file formats. Data alignment and integrated handling of missing data. Reshaping and pivoting of date sets. Label-based slicing, indexing and subsetting of large data sets. Columns from a data structure can be deleted or inserted. Group by data for aggregation and transformations. High performance merging and joining of data. Time Series functionality. Python Pandas – Environment Setup Standard Python distribution doesn”t come bundled with Pandas module. A lightweight alternative is to install NumPy using popular Python package installer, pip. pip install pandas If you install Anaconda Python package, Pandas will be installed by default with the following − Windows Anaconda (from https://www.continuum.io) is a free Python distribution for SciPy stack. It is also available for Linux and Mac. Canopy (https://www.enthought.com/products/canopy/) is available as free as well as commercial distribution with full SciPy stack for Windows, Linux and Mac. Python (x,y) is a free Python distribution with SciPy stack and Spyder IDE for Windows OS. (Downloadable from http://python-xy.github.io/) Linux Package managers of respective Linux distributions are used to install one or more packages in SciPy stack. For Ubuntu Users sudo apt-get install python-numpy python-scipy python-matplotlibipythonipythonnotebook python-pandas python-sympy python-nose For Fedora Users sudo yum install numpyscipy python-matplotlibipython python-pandas sympy python-nose atlas-devel Introduction to Data Structures Pandas deals with the following three data structures − Series DataFrame Panel These data structures are built on top of Numpy array, which means they are fast. Dimension & Description The best way to think of these data structures is that the higher dimensional data structure is a container of its lower dimensional data structure. For example, DataFrame is a container of Series, Panel is a container of DataFrame. Data Structure Dimensions Description Series 1 1D labeled homogeneous array, sizeimmutable. Data Frames 2 General 2D labeled, size-mutable tabular structure with potentially heterogeneously typed columns. Panel 3 General 3D labeled, size-mutable array. Building and handling two or more dimensional arrays is a tedious task, burden is placed on the user to consider the orientation of the data set when writing functions. But using Pandas data structures, the mental effort of the user is reduced. For example, with tabular data (DataFrame) it is more semantically helpful to think of the index (the rows) and the columns rather than axis 0 and axis 1. Mutability All Pandas data structures are value mutable (can be changed) and except Series all are size mutable. Series is size immutable. Note − DataFrame is widely used and one of the most important data structures. Panel is used much less. Series Series is a one-dimensional array like structure with homogeneous data. For example, the following series is a collection of integers 10, 23, 56, … 10 23 56 17 52 61 73 90 26 72 Key Points Homogeneous data Size Immutable Values of Data Mutable DataFrame DataFrame is a two-dimensional array with heterogeneous data. For example, Name Age Gender Rating Steve 32 Male 3.45 Lia 28 Female 4.6 Vin 45 Male 3.9 Katie 38 Female 2.78 The table represents the data of a sales team of an organization with their overall performance rating. The data is represented in rows and columns. Each column represents an attribute and each row represents a person. Data Type of Columns The data types of the four columns are as follows − Column Type Name String Age Integer Gender String Rating Float Key Points Heterogeneous data Size Mutable Data Mutable Panel Panel is a three-dimensional data structure with heterogeneous data. It is hard to represent the panel in graphical representation. But a panel can be illustrated as a container of DataFrame. Key Points Heterogeneous data Size Mutable Data Mutable Python Pandas – Series Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index. pandas.Series A pandas Series can be created using the following constructor − pandas.Series( data, index, dtype, copy) The parameters of the constructor are as follows − Sr.No Parameter & Description 1 data data takes various forms like ndarray, list, constants 2 index Index values must be unique and hashable, same length as data. Default np.arange(n) if no index is passed. 3 dtype dtype is for data type. If None, data type will be inferred 4 copy Copy data. Default False A series can be created using various inputs like − Array Dict Scalar value or constant Create an Empty Series A basic series, which can be created is an Empty Series. Example Live Demo #import the pandas library and aliasing as pd import pandas as pd s = pd.Series() print s Its output is as follows − Series([], dtype: float64) Create a Series from ndarray If data is an ndarray, then index passed must be of the same length. If no index is passed, then by default index will be range(n) where n is array length, i.e., [0,1,2,3…. range(len(array))-1]. Example 1 Live Demo #import the pandas library and aliasing as pd import pandas as pd import numpy as np data = np.array([”a”,”b”,”c”,”d”]) s
Options & Customization
Python Pandas – Options and Customization ”; Previous Next Pandas provide API to customize some aspects of its behavior, display is being mostly used. The API is composed of five relevant functions. They are − get_option() set_option() reset_option() describe_option() option_context() Let us now understand how the functions operate. get_option(param) get_option takes a single parameter and returns the value as given in the output below − display.max_rows Displays the default number of value. Interpreter reads this value and displays the rows with this value as upper limit to display. Live Demo import pandas as pd print pd.get_option(“display.max_rows”) Its output is as follows − 60 display.max_columns Displays the default number of value. Interpreter reads this value and displays the rows with this value as upper limit to display. Live Demo import pandas as pd print pd.get_option(“display.max_columns”) Its output is as follows − 20 Here, 60 and 20 are the default configuration parameter values. set_option(param,value) set_option takes two arguments and sets the value to the parameter as shown below − display.max_rows Using set_option(), we can change the default number of rows to be displayed. Live Demo import pandas as pd pd.set_option(“display.max_rows”,80) print pd.get_option(“display.max_rows”) Its output is as follows − 80 display.max_columns Using set_option(), we can change the default number of rows to be displayed. Live Demo import pandas as pd pd.set_option(“display.max_columns”,30) print pd.get_option(“display.max_columns”) Its output is as follows − 30 reset_option(param) reset_option takes an argument and sets the value back to the default value. display.max_rows Using reset_option(), we can change the value back to the default number of rows to be displayed. Live Demo import pandas as pd pd.reset_option(“display.max_rows”) print pd.get_option(“display.max_rows”) Its output is as follows − 60 describe_option(param) describe_option prints the description of the argument. display.max_rows Using reset_option(), we can change the value back to the default number of rows to be displayed. Live Demo import pandas as pd pd.describe_option(“display.max_rows”) Its output is as follows − display.max_rows : int If max_rows is exceeded, switch to truncate view. Depending on ”large_repr”, objects are either centrally truncated or printed as a summary view. ”None” value means unlimited. In case python/IPython is running in a terminal and `large_repr` equals ”truncate” this can be set to 0 and pandas will auto-detect the height of the terminal and print a truncated object which fits the screen height. The IPython notebook, IPython qtconsole, or IDLE do not run in a terminal and hence it is not possible to do correct auto-detection. [default: 60] [currently: 60] option_context() option_context context manager is used to set the option in with statement temporarily. Option values are restored automatically when you exit the with block − display.max_rows Using option_context(), we can set the value temporarily. Live Demo import pandas as pd with pd.option_context(“display.max_rows”,10): print(pd.get_option(“display.max_rows”)) print(pd.get_option(“display.max_rows”)) Its output is as follows − 10 10 See, the difference between the first and the second print statements. The first statement prints the value set by option_context() which is temporary within the with context itself. After the with context, the second print statement prints the configured value. Frequently used Parameters Sr.No Parameter & Description 1 display.max_rows Displays maximum number of rows to display 2 2 display.max_columns Displays maximum number of columns to display 3 display.expand_frame_repr Displays DataFrames to Stretch Pages 4 display.max_colwidth Displays maximum column width 5 display.precision Displays precision for decimal numbers Print Page Previous Next Advertisements ”;
Python Pandas – Iteration
Python Pandas – Iteration ”; Previous Next The behavior of basic iteration over Pandas objects depends on the type. When iterating over a Series, it is regarded as array-like, and basic iteration produces the values. Other data structures, like DataFrame and Panel, follow the dict-like convention of iterating over the keys of the objects. In short, basic iteration (for i in object) produces − Series − values DataFrame − column labels Panel − item labels Iterating a DataFrame Iterating a DataFrame gives column names. Let us consider the following example to understand the same. Live Demo import pandas as pd import numpy as np N=20 df = pd.DataFrame({ ”A”: pd.date_range(start=”2016-01-01”,periods=N,freq=”D”), ”x”: np.linspace(0,stop=N-1,num=N), ”y”: np.random.rand(N), ”C”: np.random.choice([”Low”,”Medium”,”High”],N).tolist(), ”D”: np.random.normal(100, 10, size=(N)).tolist() }) for col in df: print col Its output is as follows − A C D x y To iterate over the rows of the DataFrame, we can use the following functions − iteritems() − to iterate over the (key,value) pairs iterrows() − iterate over the rows as (index,series) pairs itertuples() − iterate over the rows as namedtuples iteritems() Iterates over each column as key, value pair with label as key and column value as a Series object. Live Demo import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(4,3),columns=[”col1”,”col2”,”col3”]) for key,value in df.iteritems(): print key,value Its output is as follows − col1 0 0.802390 1 0.324060 2 0.256811 3 0.839186 Name: col1, dtype: float64 col2 0 1.624313 1 -1.033582 2 1.796663 3 1.856277 Name: col2, dtype: float64 col3 0 -0.022142 1 -0.230820 2 1.160691 3 -0.830279 Name: col3, dtype: float64 Observe, each column is iterated separately as a key-value pair in a Series. iterrows() iterrows() returns the iterator yielding each index value along with a series containing the data in each row. Live Demo import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(4,3),columns = [”col1”,”col2”,”col3”]) for row_index,row in df.iterrows(): print row_index,row Its output is as follows − 0 col1 1.529759 col2 0.762811 col3 -0.634691 Name: 0, dtype: float64 1 col1 -0.944087 col2 1.420919 col3 -0.507895 Name: 1, dtype: float64 2 col1 -0.077287 col2 -0.858556 col3 -0.663385 Name: 2, dtype: float64 3 col1 -1.638578 col2 0.059866 col3 0.493482 Name: 3, dtype: float64 Note − Because iterrows() iterate over the rows, it doesn”t preserve the data type across the row. 0,1,2 are the row indices and col1,col2,col3 are column indices. itertuples() itertuples() method will return an iterator yielding a named tuple for each row in the DataFrame. The first element of the tuple will be the row’s corresponding index value, while the remaining values are the row values. Live Demo import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(4,3),columns = [”col1”,”col2”,”col3”]) for row in df.itertuples(): print row Its output is as follows − Pandas(Index=0, col1=1.5297586201375899, col2=0.76281127433814944, col3=- 0.6346908238310438) Pandas(Index=1, col1=-0.94408735763808649, col2=1.4209186418359423, col3=- 0.50789517967096232) Pandas(Index=2, col1=-0.07728664756791935, col2=-0.85855574139699076, col3=- 0.6633852507207626) Pandas(Index=3, col1=0.65734942534106289, col2=-0.95057710432604969, col3=0.80344487462316527) Note − Do not try to modify any object while iterating. Iterating is meant for reading and the iterator returns a copy of the original object (a view), thus the changes will not reflect on the original object. Live Demo import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(4,3),columns = [”col1”,”col2”,”col3”]) for index, row in df.iterrows(): row[”a”] = 10 print df Its output is as follows − col1 col2 col3 0 -1.739815 0.735595 -0.295589 1 0.635485 0.106803 1.527922 2 -0.939064 0.547095 0.038585 3 -1.016509 -0.116580 -0.523158 Observe, no changes reflected. Print Page Previous Next Advertisements ”;
Working with Text Data
Python Pandas – Working with Text Data ”; Previous Next In this chapter, we will discuss the string operations with our basic Series/Index. In the subsequent chapters, we will learn how to apply these string functions on the DataFrame. Pandas provides a set of string functions which make it easy to operate on string data. Most importantly, these functions ignore (or exclude) missing/NaN values. Almost, all of these methods work with Python string functions (refer: https://docs.python.org/3/library/stdtypes.html#string-methods). So, convert the Series Object to String Object and then perform the operation. Let us now see how each operation performs. Sr.No Function & Description 1 lower() Converts strings in the Series/Index to lower case. 2 upper() Converts strings in the Series/Index to upper case. 3 len() Computes String length(). 4 strip() Helps strip whitespace(including newline) from each string in the Series/index from both the sides. 5 split(” ”) Splits each string with the given pattern. 6 cat(sep=” ”) Concatenates the series/index elements with given separator. 7 get_dummies() Returns the DataFrame with One-Hot Encoded values. 8 contains(pattern) Returns a Boolean value True for each element if the substring contains in the element, else False. 9 replace(a,b) Replaces the value a with the value b. 10 repeat(value) Repeats each element with specified number of times. 11 count(pattern) Returns count of appearance of pattern in each element. 12 startswith(pattern) Returns true if the element in the Series/Index starts with the pattern. 13 endswith(pattern) Returns true if the element in the Series/Index ends with the pattern. 14 find(pattern) Returns the first position of the first occurrence of the pattern. 15 findall(pattern) Returns a list of all occurrence of the pattern. 16 swapcase Swaps the case lower/upper. 17 islower() Checks whether all characters in each string in the Series/Index in lower case or not. Returns Boolean 18 isupper() Checks whether all characters in each string in the Series/Index in upper case or not. Returns Boolean. 19 isnumeric() Checks whether all characters in each string in the Series/Index are numeric. Returns Boolean. Let us now create a Series and see how all the above functions work. Live Demo import pandas as pd import numpy as np s = pd.Series([”Tom”, ”William Rick”, ”John”, ”Alber@t”, np.nan, ”1234”,”SteveSmith”]) print s Its output is as follows − 0 Tom 1 William Rick 2 John 3 Alber@t 4 NaN 5 1234 6 Steve Smith dtype: object lower() Live Demo import pandas as pd import numpy as np s = pd.Series([”Tom”, ”William Rick”, ”John”, ”Alber@t”, np.nan, ”1234”,”SteveSmith”]) print s.str.lower() Its output is as follows − 0 tom 1 william rick 2 john 3 alber@t 4 NaN 5 1234 6 steve smith dtype: object upper() Live Demo import pandas as pd import numpy as np s = pd.Series([”Tom”, ”William Rick”, ”John”, ”Alber@t”, np.nan, ”1234”,”SteveSmith”]) print s.str.upper() Its output is as follows − 0 TOM 1 WILLIAM RICK 2 JOHN 3 ALBER@T 4 NaN 5 1234 6 STEVE SMITH dtype: object len() Live Demo import pandas as pd import numpy as np s = pd.Series([”Tom”, ”William Rick”, ”John”, ”Alber@t”, np.nan, ”1234”,”SteveSmith”]) print s.str.len() Its output is as follows − 0 3.0 1 12.0 2 4.0 3 7.0 4 NaN 5 4.0 6 10.0 dtype: float64 strip() Live Demo import pandas as pd import numpy as np s = pd.Series([”Tom ”, ” William Rick”, ”John”, ”Alber@t”]) print s print (“After Stripping:”) print s.str.strip() Its output is as follows − 0 Tom 1 William Rick 2 John 3 Alber@t dtype: object After Stripping: 0 Tom 1 William Rick 2 John 3 Alber@t dtype: object split(pattern) Live Demo import pandas as pd import numpy as np s = pd.Series([”Tom ”, ” William Rick”, ”John”, ”Alber@t”]) print s print (“Split Pattern:”) print s.str.split(” ”) Its output is as follows − 0 Tom 1 William Rick 2 John 3 Alber@t dtype: object Split Pattern: 0 [Tom, , , , , , , , , , ] 1 [, , , , , William, Rick] 2 [John] 3 [Alber@t] dtype: object cat(sep=pattern) Live Demo import pandas as pd import numpy as np s = pd.Series([”Tom ”, ” William Rick”, ”John”, ”Alber@t”]) print s.str.cat(sep=”_”) Its output is as follows − Tom _ William Rick_John_Alber@t get_dummies() Live Demo import pandas as pd import numpy as np s = pd.Series([”Tom ”, ” William Rick”, ”John”, ”Alber@t”]) print s.str.get_dummies() Its output is as follows − William Rick Alber@t John Tom 0 0 0 0 1 1 1 0 0 0 2 0 0 1 0 3 0 1 0 0 contains () Live Demo import pandas as pd s = pd.Series([”Tom ”, ” William Rick”, ”John”, ”Alber@t”]) print s.str.contains(” ”) Its output is as follows − 0 True 1 True 2 False 3 False dtype: bool replace(a,b) Live Demo import pandas as pd s = pd.Series([”Tom ”, ” William Rick”, ”John”, ”Alber@t”]) print s print (“After replacing @ with $:”) print s.str.replace(”@”,”$”) Its output is as follows − 0 Tom 1 William Rick 2 John 3 Alber@t dtype: object After replacing @ with $: 0 Tom 1 William Rick 2 John 3 Alber$t dtype: object repeat(value) Live Demo import pandas as pd s = pd.Series([”Tom ”, ” William Rick”, ”John”, ”Alber@t”]) print s.str.repeat(2) Its output is as follows − 0 Tom Tom 1 William Rick William Rick 2 JohnJohn 3 Alber@tAlber@t dtype: object count(pattern) Live Demo import pandas as pd s = pd.Series([”Tom ”, ” William Rick”, ”John”, ”Alber@t”]) print (“The number of ”m”s in each string:”) print s.str.count(”m”) Its output is as follows − The number of ”m”s in each string: 0 1 1 1 2 0 3 0 startswith(pattern) Live Demo import pandas as pd s = pd.Series([”Tom ”, ” William Rick”, ”John”, ”Alber@t”]) print (“Strings that start with ”T”:”) print s.str. startswith (”T”) Its output is as follows − 0 True 1 False 2 False 3 False dtype: bool endswith(pattern) Live Demo import pandas as pd s = pd.Series([”Tom ”, ” William Rick”, ”John”, ”Alber@t”]) print (“Strings that end with ”t”:”) print s.str.endswith(”t”) Its output is as follows − Strings
Python Pandas – Basic Functionality ”; Previous Next By now, we learnt about the three Pandas DataStructures and how to create them. We will majorly focus on the DataFrame objects because of its importance in the real time data processing and also discuss a few other DataStructures. Series Basic Functionality Sr.No. Attribute or Method & Description 1 axes Returns a list of the row axis labels 2 dtype Returns the dtype of the object. 3 empty Returns True if series is empty. 4 ndim Returns the number of dimensions of the underlying data, by definition 1. 5 size Returns the number of elements in the underlying data. 6 values Returns the Series as ndarray. 7 head() Returns the first n rows. 8 tail() Returns the last n rows. Let us now create a Series and see all the above tabulated attributes operation. Example Live Demo import pandas as pd import numpy as np #Create a series with 100 random numbers s = pd.Series(np.random.randn(4)) print s Its output is as follows − 0 0.967853 1 -0.148368 2 -1.395906 3 -1.758394 dtype: float64 axes Returns the list of the labels of the series. Live Demo import pandas as pd import numpy as np #Create a series with 100 random numbers s = pd.Series(np.random.randn(4)) print (“The axes are:”) print s.axes Its output is as follows − The axes are: [RangeIndex(start=0, stop=4, step=1)] The above result is a compact format of a list of values from 0 to 5, i.e., [0,1,2,3,4]. empty Returns the Boolean value saying whether the Object is empty or not. True indicates that the object is empty. Live Demo import pandas as pd import numpy as np #Create a series with 100 random numbers s = pd.Series(np.random.randn(4)) print (“Is the Object empty?”) print s.empty Its output is as follows − Is the Object empty? False ndim Returns the number of dimensions of the object. By definition, a Series is a 1D data structure, so it returns Live Demo import pandas as pd import numpy as np #Create a series with 4 random numbers s = pd.Series(np.random.randn(4)) print s print (“The dimensions of the object:”) print s.ndim Its output is as follows − 0 0.175898 1 0.166197 2 -0.609712 3 -1.377000 dtype: float64 The dimensions of the object: 1 size Returns the size(length) of the series. Live Demo import pandas as pd import numpy as np #Create a series with 4 random numbers s = pd.Series(np.random.randn(2)) print s print (“The size of the object:”) print s.size Its output is as follows − 0 3.078058 1 -1.207803 dtype: float64 The size of the object: 2 values Returns the actual data in the series as an array. Live Demo import pandas as pd import numpy as np #Create a series with 4 random numbers s = pd.Series(np.random.randn(4)) print s print (“The actual data series is:”) print s.values Its output is as follows − 0 1.787373 1 -0.605159 2 0.180477 3 -0.140922 dtype: float64 The actual data series is: [ 1.78737302 -0.60515881 0.18047664 -0.1409218 ] Head & Tail To view a small sample of a Series or the DataFrame object, use the head() and the tail() methods. head() returns the first n rows(observe the index values). The default number of elements to display is five, but you may pass a custom number. Live Demo import pandas as pd import numpy as np #Create a series with 4 random numbers s = pd.Series(np.random.randn(4)) print (“The original series is:”) print s print (“The first two rows of the data series:”) print s.head(2) Its output is as follows − The original series is: 0 0.720876 1 -0.765898 2 0.479221 3 -0.139547 dtype: float64 The first two rows of the data series: 0 0.720876 1 -0.765898 dtype: float64 tail() returns the last n rows(observe the index values). The default number of elements to display is five, but you may pass a custom number. Live Demo import pandas as pd import numpy as np #Create a series with 4 random numbers s = pd.Series(np.random.randn(4)) print (“The original series is:”) print s print (“The last two rows of the data series:”) print s.tail(2) Its output is as follows − The original series is: 0 -0.655091 1 -0.881407 2 -0.608592 3 -2.341413 dtype: float64 The last two rows of the data series: 2 -0.608592 3 -2.341413 dtype: float64 DataFrame Basic Functionality Let us now understand what DataFrame Basic Functionality is. The following tables lists down the important attributes or methods that help in DataFrame Basic Functionality. Sr.No. Attribute or Method & Description 1 T Transposes rows and columns. 2 axes Returns a list with the row axis labels and column axis labels as the only members. 3 dtypes Returns the dtypes in this object. 4 empty True if NDFrame is entirely empty [no items]; if any of the axes are of length 0. 5 ndim Number of axes / array dimensions. 6 shape Returns a tuple representing the dimensionality of the DataFrame. 7 size Number of elements in the NDFrame. 8 values Numpy representation of NDFrame. 9 head() Returns the first n rows. 10 tail() Returns last n rows. Let us now create a DataFrame and see all how the above mentioned attributes operate. Example Live Demo import pandas as pd import numpy as np #Create a Dictionary of series d = {”Name”:pd.Series([”Tom”,”James”,”Ricky”,”Vin”,”Steve”,”Smith”,”Jack”]), ”Age”:pd.Series([25,26,25,23,30,29,23]), ”Rating”:pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])} #Create a DataFrame df = pd.DataFrame(d) print (“Our data series is:”) print df Its output is as follows − Our data series is: Age Name Rating 0 25 Tom 4.23 1 26 James 3.24 2 25 Ricky 3.98 3 23 Vin 2.56 4 30 Steve 3.20 5 29 Smith 4.60 6 23 Jack 3.80 T (Transpose) Returns the transpose of the DataFrame. The rows and columns will interchange. Live Demo import pandas as pd import numpy as np # Create a Dictionary of series d = {”Name”:pd.Series([”Tom”,”James”,”Ricky”,”Vin”,”Steve”,”Smith”,”Jack”]), ”Age”:pd.Series([25,26,25,23,30,29,23]), ”Rating”:pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])} # Create a DataFrame df = pd.DataFrame(d) print (“The transpose of the data series is:”) print df.T Its output is as follows −