”;
JSON file stores data as text in human-readable format. JSON stands for JavaScript Object Notation. Pandas can read JSON files using the read_json function.
Input Data
Create a JSON file by copying the below data into a text editor like notepad. Save the file with .json extension and choosing the file type as all files(*.*).
{ "ID":["1","2","3","4","5","6","7","8" ], "Name":["Rick","Dan","Michelle","Ryan","Gary","Nina","Simon","Guru" ] "Salary":["623.3","515.2","611","729","843.25","578","632.8","722.5" ], "StartDate":[ "1/1/2012","9/23/2013","11/15/2014","5/11/2014","3/27/2015","5/21/2013", "7/30/2013","6/17/2014"], "Dept":[ "IT","Operations","IT","HR","Finance","IT","Operations","Finance"] }
Read the JSON File
The read_json function of the pandas library can be used to read the JSON file into a pandas DataFrame.
import pandas as pd data = pd.read_json(''path/input.json'') print (data)
When we execute the above code, it produces the following result.
Dept ID Name Salary StartDate 0 IT 1 Rick 623.30 1/1/2012 1 Operations 2 Dan 515.20 9/23/2013 2 IT 3 Tusar 611.00 11/15/2014 3 HR 4 Ryan 729.00 5/11/2014 4 Finance 5 Gary 843.25 3/27/2015 5 IT 6 Rasmi 578.00 5/21/2013 6 Operations 7 Pranab 632.80 7/30/2013 7 Finance 8 Guru 722.50 6/17/2014
Reading Specific Columns and Rows
Similar to what we have already seen in the previous chapter to read the CSV file, the read_json function of the pandas library can also be used to read some specific columns and specific rows after the JSON file is read to a DataFrame.
We use the multi-axes indexing method called .loc() for this purpose. We choose to display the Salary and Name column for some of the rows.
import pandas as pd data = pd.read_json(''path/input.xlsx'') # Use the multi-axes indexing funtion print (data.loc[[1,3,5],[''salary'',''name'']])
When we execute the above code, it produces the following result.
salary name 1 515.2 Dan 3 729.0 Ryan 5 578.0 Rasmi
Reading JSON file as Records
We can also apply the to_json function along with parameters to read the JSON file content into individual records.
import pandas as pd data = pd.read_json(''path/input.xlsx'') print(data.to_json(orient=''records'', lines=True))
When we execute the above code, it produces the following result.
{"Dept":"IT","ID":1,"Name":"Rick","Salary":623.3,"StartDate":"1/1/2012"} {"Dept":"Operations","ID":2,"Name":"Dan","Salary":515.2,"StartDate":"9/23/2013"} {"Dept":"IT","ID":3,"Name":"Tusar","Salary":611.0,"StartDate":"11/15/2014"} {"Dept":"HR","ID":4,"Name":"Ryan","Salary":729.0,"StartDate":"5/11/2014"} {"Dept":"Finance","ID":5,"Name":"Gary","Salary":843.25,"StartDate":"3/27/2015"} {"Dept":"IT","ID":6,"Name":"Rasmi","Salary":578.0,"StartDate":"5/21/2013"} {"Dept":"Operations","ID":7,"Name":"Pranab","Salary":632.8,"StartDate":"7/30/2013"} {"Dept":"Finance","ID":8,"Name":"Guru","Salary":722.5,"StartDate":"6/17/2014"}
”;