Python Pandas – Reindexing

Python Pandas – Reindexing ”; Previous Next Reindexing changes the row labels and column labels of a DataFrame. To reindex means to conform the data to match a given set of labels along a particular axis. Multiple operations can be accomplished through indexing like − Reorder the existing data to match a new set of labels. Insert missing value (NA) markers in label locations where no data for the label existed. Example Live Demo import pandas as pd import numpy as np N=20 df = pd.DataFrame({ ”A”: pd.date_range(start=”2016-01-01”,periods=N,freq=”D”), ”x”: np.linspace(0,stop=N-1,num=N), ”y”: np.random.rand(N), ”C”: np.random.choice([”Low”,”Medium”,”High”],N).tolist(), ”D”: np.random.normal(100, 10, size=(N)).tolist() }) #reindex the DataFrame df_reindexed = df.reindex(index=[0,2,5], columns=[”A”, ”C”, ”B”]) print df_reindexed Its output is as follows − A C B 0 2016-01-01 Low NaN 2 2016-01-03 High NaN 5 2016-01-06 Low NaN Reindex to Align with Other Objects You may wish to take an object and reindex its axes to be labeled the same as another object. Consider the following example to understand the same. Example Live Demo import pandas as pd import numpy as np df1 = pd.DataFrame(np.random.randn(10,3),columns=[”col1”,”col2”,”col3”]) df2 = pd.DataFrame(np.random.randn(7,3),columns=[”col1”,”col2”,”col3”]) df1 = df1.reindex_like(df2) print df1 Its output is as follows − col1 col2 col3 0 -2.467652 -1.211687 -0.391761 1 -0.287396 0.522350 0.562512 2 -0.255409 -0.483250 1.866258 3 -1.150467 -0.646493 -0.222462 4 0.152768 -2.056643 1.877233 5 -1.155997 1.528719 -1.343719 6 -1.015606 -1.245936 -0.295275 Note − Here, the df1 DataFrame is altered and reindexed like df2. The column names should be matched or else NAN will be added for the entire column label. Filling while ReIndexing reindex() takes an optional parameter method which is a filling method with values as follows − pad/ffill − Fill values forward bfill/backfill − Fill values backward nearest − Fill from the nearest index values Example Live Demo import pandas as pd import numpy as np df1 = pd.DataFrame(np.random.randn(6,3),columns=[”col1”,”col2”,”col3”]) df2 = pd.DataFrame(np.random.randn(2,3),columns=[”col1”,”col2”,”col3”]) # Padding NAN”s print df2.reindex_like(df1) # Now Fill the NAN”s with preceding Values print (“Data Frame with Forward Fill:”) print df2.reindex_like(df1,method=”ffill”) Its output is as follows − col1 col2 col3 0 1.311620 -0.707176 0.599863 1 -0.423455 -0.700265 1.133371 2 NaN NaN NaN 3 NaN NaN NaN 4 NaN NaN NaN 5 NaN NaN NaN Data Frame with Forward Fill: col1 col2 col3 0 1.311620 -0.707176 0.599863 1 -0.423455 -0.700265 1.133371 2 -0.423455 -0.700265 1.133371 3 -0.423455 -0.700265 1.133371 4 -0.423455 -0.700265 1.133371 5 -0.423455 -0.700265 1.133371 Note − The last four rows are padded. Limits on Filling while Reindexing The limit argument provides additional control over filling while reindexing. Limit specifies the maximum count of consecutive matches. Let us consider the following example to understand the same − Example Live Demo import pandas as pd import numpy as np df1 = pd.DataFrame(np.random.randn(6,3),columns=[”col1”,”col2”,”col3”]) df2 = pd.DataFrame(np.random.randn(2,3),columns=[”col1”,”col2”,”col3”]) # Padding NAN”s print df2.reindex_like(df1) # Now Fill the NAN”s with preceding Values print (“Data Frame with Forward Fill limiting to 1:”) print df2.reindex_like(df1,method=”ffill”,limit=1) Its output is as follows − col1 col2 col3 0 0.247784 2.128727 0.702576 1 -0.055713 -0.021732 -0.174577 2 NaN NaN NaN 3 NaN NaN NaN 4 NaN NaN NaN 5 NaN NaN NaN Data Frame with Forward Fill limiting to 1: col1 col2 col3 0 0.247784 2.128727 0.702576 1 -0.055713 -0.021732 -0.174577 2 -0.055713 -0.021732 -0.174577 3 NaN NaN NaN 4 NaN NaN NaN 5 NaN NaN NaN Note − Observe, only the 7th row is filled by the preceding 6th row. Then, the rows are left as they are. Renaming The rename() method allows you to relabel an axis based on some mapping (a dict or Series) or an arbitrary function. Let us consider the following example to understand this − Live Demo import pandas as pd import numpy as np df1 = pd.DataFrame(np.random.randn(6,3),columns=[”col1”,”col2”,”col3”]) print df1 print (“After renaming the rows and columns:”) print df1.rename(columns={”col1” : ”c1”, ”col2” : ”c2”}, index = {0 : ”apple”, 1 : ”banana”, 2 : ”durian”}) Its output is as follows − col1 col2 col3 0 0.486791 0.105759 1.540122 1 -0.990237 1.007885 -0.217896 2 -0.483855 -1.645027 -1.194113 3 -0.122316 0.566277 -0.366028 4 -0.231524 -0.721172 -0.112007 5 0.438810 0.000225 0.435479 After renaming the rows and columns: c1 c2 col3 apple 0.486791 0.105759 1.540122 banana -0.990237 1.007885 -0.217896 durian -0.483855 -1.645027 -1.194113 3 -0.122316 0.566277 -0.366028 4 -0.231524 -0.721172 -0.112007 5 0.438810 0.000225 0.435479 The rename() method provides an inplace named parameter, which by default is False and copies the underlying data. Pass inplace=True to rename the data in place. Print Page Previous Next Advertisements ”;

Python Pandas – Series

Python Pandas – Series ”; Previous Next Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index. pandas.Series A pandas Series can be created using the following constructor − pandas.Series( data, index, dtype, copy) The parameters of the constructor are as follows − Sr.No Parameter & Description 1 data data takes various forms like ndarray, list, constants 2 index Index values must be unique and hashable, same length as data. Default np.arrange(n) if no index is passed. 3 dtype dtype is for data type. If None, data type will be inferred 4 copy Copy data. Default False A series can be created using various inputs like − Array Dict Scalar value or constant Create an Empty Series A basic series, which can be created is an Empty Series. Example Live Demo #import the pandas library and aliasing as pd import pandas as pd s = pd.Series() print s Its output is as follows − Series([], dtype: float64) Create a Series from ndarray If data is an ndarray, then index passed must be of the same length. If no index is passed, then by default index will be range(n) where n is array length, i.e., [0,1,2,3…. range(len(array))-1]. Example 1 Live Demo #import the pandas library and aliasing as pd import pandas as pd import numpy as np data = np.array([”a”,”b”,”c”,”d”]) s = pd.Series(data) print s Its output is as follows − 0 a 1 b 2 c 3 d dtype: object We did not pass any index, so by default, it assigned the indexes ranging from 0 to len(data)-1, i.e., 0 to 3. Example 2 Live Demo #import the pandas library and aliasing as pd import pandas as pd import numpy as np data = np.array([”a”,”b”,”c”,”d”]) s = pd.Series(data,index=[100,101,102,103]) print s Its output is as follows − 100 a 101 b 102 c 103 d dtype: object We passed the index values here. Now we can see the customized indexed values in the output. Create a Series from dict A dict can be passed as input and if no index is specified, then the dictionary keys are taken in a sorted order to construct index. If index is passed, the values in data corresponding to the labels in the index will be pulled out. Example 1 Live Demo #import the pandas library and aliasing as pd import pandas as pd import numpy as np data = {”a” : 0., ”b” : 1., ”c” : 2.} s = pd.Series(data) print s Its output is as follows − a 0.0 b 1.0 c 2.0 dtype: float64 Observe − Dictionary keys are used to construct index. Example 2 Live Demo #import the pandas library and aliasing as pd import pandas as pd import numpy as np data = {”a” : 0., ”b” : 1., ”c” : 2.} s = pd.Series(data,index=[”b”,”c”,”d”,”a”]) print s Its output is as follows − b 1.0 c 2.0 d NaN a 0.0 dtype: float64 Observe − Index order is persisted and the missing element is filled with NaN (Not a Number). Create a Series from Scalar If data is a scalar value, an index must be provided. The value will be repeated to match the length of index Live Demo #import the pandas library and aliasing as pd import pandas as pd import numpy as np s = pd.Series(5, index=[0, 1, 2, 3]) print s Its output is as follows − 0 5 1 5 2 5 3 5 dtype: int64 Accessing Data from Series with Position Data in the series can be accessed similar to that in an ndarray. Example 1 Retrieve the first element. As we already know, the counting starts from zero for the array, which means the first element is stored at zeroth position and so on. Live Demo import pandas as pd s = pd.Series([1,2,3,4,5],index = [”a”,”b”,”c”,”d”,”e”]) #retrieve the first element print s[0] Its output is as follows − 1 Example 2 Retrieve the first three elements in the Series. If a : is inserted in front of it, all items from that index onwards will be extracted. If two parameters (with : between them) is used, items between the two indexes (not including the stop index) Live Demo import pandas as pd s = pd.Series([1,2,3,4,5],index = [”a”,”b”,”c”,”d”,”e”]) #retrieve the first three element print s[:3] Its output is as follows − a 1 b 2 c 3 dtype: int64 Example 3 Retrieve the last three elements. Live Demo import pandas as pd s = pd.Series([1,2,3,4,5],index = [”a”,”b”,”c”,”d”,”e”]) #retrieve the last three element print s[-3:] Its output is as follows − c 3 d 4 e 5 dtype: int64 Retrieve Data Using Label (Index) A Series is like a fixed-size dict in that you can get and set values by index label. Example 1 Retrieve a single element using index label value. Live Demo import pandas as pd s = pd.Series([1,2,3,4,5],index = [”a”,”b”,”c”,”d”,”e”]) #retrieve a single element print s[”a”] Its output is as follows − 1 Example 2 Retrieve multiple elements using a list of index label values. Live Demo import pandas as pd s = pd.Series([1,2,3,4,5],index = [”a”,”b”,”c”,”d”,”e”]) #retrieve multiple elements print s[[”a”,”c”,”d”]] Its output is as follows − a 1 c 3 d 4 dtype: int64 Example 3 If a label is not contained, an exception is raised. import pandas as pd s = pd.Series([1,2,3,4,5],index = [”a”,”b”,”c”,”d”,”e”]) #retrieve multiple elements print s[”f”] Its output is as follows − … KeyError: ”f” Print Page Previous Next Advertisements ”;

Python Pandas – DataFrame

Python Pandas – DataFrame ”; Previous Next A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Features of DataFrame Potentially columns are of different types Size – Mutable Labeled axes (rows and columns) Can Perform Arithmetic operations on rows and columns Structure Let us assume that we are creating a data frame with student’s data. You can think of it as an SQL table or a spreadsheet data representation. pandas.DataFrame A pandas DataFrame can be created using the following constructor − pandas.DataFrame( data, index, columns, dtype, copy) The parameters of the constructor are as follows − Sr.No Parameter & Description 1 data data takes various forms like ndarray, series, map, lists, dict, constants and also another DataFrame. 2 index For the row labels, the Index to be used for the resulting frame is Optional Default np.arange(n) if no index is passed. 3 columns For column labels, the optional default syntax is – np.arange(n). This is only true if no index is passed. 4 dtype Data type of each column. 5 copy This command (or whatever it is) is used for copying of data, if the default is False. Create DataFrame A pandas DataFrame can be created using various inputs like − Lists dict Series Numpy ndarrays Another DataFrame In the subsequent sections of this chapter, we will see how to create a DataFrame using these inputs. Create an Empty DataFrame A basic DataFrame, which can be created is an Empty Dataframe. Example Live Demo #import the pandas library and aliasing as pd import pandas as pd df = pd.DataFrame() print df Its output is as follows − Empty DataFrame Columns: [] Index: [] Create a DataFrame from Lists The DataFrame can be created using a single list or a list of lists. Example 1 Live Demo import pandas as pd data = [1,2,3,4,5] df = pd.DataFrame(data) print df Its output is as follows − 0 0 1 1 2 2 3 3 4 4 5 Example 2 Live Demo import pandas as pd data = [[”Alex”,10],[”Bob”,12],[”Clarke”,13]] df = pd.DataFrame(data,columns=[”Name”,”Age”]) print df Its output is as follows − Name Age 0 Alex 10 1 Bob 12 2 Clarke 13 Example 3 Live Demo import pandas as pd data = [[”Alex”,10],[”Bob”,12],[”Clarke”,13]] df = pd.DataFrame(data,columns=[”Name”,”Age”],dtype=float) print df Its output is as follows − Name Age 0 Alex 10.0 1 Bob 12.0 2 Clarke 13.0 Note − Observe, the dtype parameter changes the type of Age column to floating point. Create a DataFrame from Dict of ndarrays / Lists All the ndarrays must be of same length. If index is passed, then the length of the index should equal to the length of the arrays. If no index is passed, then by default, index will be range(n), where n is the array length. Example 1 Live Demo import pandas as pd data = {”Name”:[”Tom”, ”Jack”, ”Steve”, ”Ricky”],”Age”:[28,34,29,42]} df = pd.DataFrame(data) print df Its output is as follows − Age Name 0 28 Tom 1 34 Jack 2 29 Steve 3 42 Ricky Note − Observe the values 0,1,2,3. They are the default index assigned to each using the function range(n). Example 2 Let us now create an indexed DataFrame using arrays. Live Demo import pandas as pd data = {”Name”:[”Tom”, ”Jack”, ”Steve”, ”Ricky”],”Age”:[28,34,29,42]} df = pd.DataFrame(data, index=[”rank1”,”rank2”,”rank3”,”rank4”]) print df Its output is as follows − Age Name rank1 28 Tom rank2 34 Jack rank3 29 Steve rank4 42 Ricky Note − Observe, the index parameter assigns an index to each row. Create a DataFrame from List of Dicts List of Dictionaries can be passed as input data to create a DataFrame. The dictionary keys are by default taken as column names. Example 1 The following example shows how to create a DataFrame by passing a list of dictionaries. Live Demo import pandas as pd data = [{”a”: 1, ”b”: 2},{”a”: 5, ”b”: 10, ”c”: 20}] df = pd.DataFrame(data) print df Its output is as follows − a b c 0 1 2 NaN 1 5 10 20.0 Note − Observe, NaN (Not a Number) is appended in missing areas. Example 2 The following example shows how to create a DataFrame by passing a list of dictionaries and the row indices. Live Demo import pandas as pd data = [{”a”: 1, ”b”: 2},{”a”: 5, ”b”: 10, ”c”: 20}] df = pd.DataFrame(data, index=[”first”, ”second”]) print df Its output is as follows − a b c first 1 2 NaN second 5 10 20.0 Example 3 The following example shows how to create a DataFrame with a list of dictionaries, row indices, and column indices. Live Demo import pandas as pd data = [{”a”: 1, ”b”: 2},{”a”: 5, ”b”: 10, ”c”: 20}] #With two column indices, values same as dictionary keys df1 = pd.DataFrame(data, index=[”first”, ”second”], columns=[”a”, ”b”]) #With two column indices with one index with other name df2 = pd.DataFrame(data, index=[”first”, ”second”], columns=[”a”, ”b1”]) print df1 print df2 Its output is as follows − #df1 output a b first 1 2 second 5 10 #df2 output a b1 first 1 NaN second 5 NaN Note − Observe, df2 DataFrame is created with a column index other than the dictionary key; thus, appended the NaN’s in place. Whereas, df1 is created with column indices same as dictionary keys, so NaN’s appended. Create a DataFrame from Dict of Series Dictionary of Series can be passed to form a DataFrame. The resultant index is the union of all the series indexes passed. Example Live Demo import pandas as pd d = {”one” : pd.Series([1, 2, 3], index=[”a”, ”b”, ”c”]), ”two” : pd.Series([1, 2, 3, 4], index=[”a”, ”b”, ”c”, ”d”])} df = pd.DataFrame(d) print df Its output is as follows − one two a 1.0 1 b 2.0 2 c 3.0 3 d NaN 4 Note − Observe, for the series one, there is no label ‘d’ passed, but in the result, for the d label, NaN is appended with NaN. Let us now understand