Data Persistence – Openpyxl Module ”; Previous Next Microsoft’s Excel is the most popular spreadsheet application. It has been in use since last more than 25 years. Later versions of Excel use Office Open XML (OOXML) file format. Hence, it has been possible to access spreadsheet files through other programming environments. OOXML is an ECMA standard file format. Python’s openpyxl package provides functionality to read/write Excel files with .xlsx extension. The openpyxl package uses class nomenclature that is similar to Microsoft Excel terminology. An Excel document is called as workbook and is saved with .xlsx extension in the file system. A workbook may have multiple worksheets. A worksheet presents a large grid of cells, each one of them can store either value or formula. Rows and columns that form the grid are numbered. Columns are identified by alphabets, A, B, C, …., Z, AA, AB, and so on. Rows are numbered starting from 1. A typical Excel worksheet appears as follows − The pip utility is good enough to install openpyxl package. pip install openpyxl The Workbook class represents an empty workbook with one blank worksheet. We need to activate it so that some data can be added to the worksheet. from openpyxl import Workbook wb=Workbook() sheet1=wb.active sheet1.title=”StudentList” As we know, a cell in worksheet is named as ColumnNameRownumber format. Accordingly, top left cell is A1. We assign a string to this cell as − sheet1[”A1”]= ”Student List” Alternately, use worksheet’s cell() method which uses row and column number to identify a cell. Call value property to cell object to assign a value. cell1=sheet1.cell(row=1, column=1) cell1.value=”Student List” After populating worksheet with data, the workbook is saved by calling save() method of workbook object. wb.save(”Student.xlsx”) This workbook file is created in current working directory. Following Python script writes a list of tuples into a workbook document. Each tuple stores roll number, age and marks of student. from openpyxl import Workbook wb = Workbook() sheet1 = wb.active sheet1.title=”Student List” sheet1.cell(column=1, row=1).value=”Student List” studentlist=[(”RollNo”,”Name”, ”age”, ”marks”),(1,”Juhi”,20,100), (2,”dilip”,20, 110) , (3,”jeevan”,24,145)] for col in range(1,5): for row in range(1,5): sheet1.cell(column=col, row=1+row).value=studentlist[row-1][col-1] wb.save(”students.xlsx”) The workbook students.xlsx is saved in current working directory. If opened using Excel application, it appears as below − The openpyxl module offers load_workbook() function that helps in reading back data in the workbook document. from openpyxl import load_workbook wb=load_workbook(”students.xlsx”) You can now access value of any cell specified by row and column number. cell1=sheet1.cell(row=1, column=1) print (cell1.value) Student List Example Following code populates a list with work sheet data. from openpyxl import load_workbook wb=load_workbook(”students.xlsx”) sheet1 = wb[”Student List”] studentlist=[] for row in range(1,5): stud=[] for col in range(1,5): val=sheet1.cell(column=col, row=1+row).value stud.append(val) studentlist.append(tuple(stud)) print (studentlist) Output [(”RollNo”, ”Name”, ”age”, ”marks”), (1, ”Juhi”, 20, 100), (2, ”dilip”, 20, 110), (3, ”jeevan”, 24, 145)] One very important feature of Excel application is the formula. To assign formula to a cell, assign it to a string containing Excel’s formula syntax. Assign AVERAGE function to c6 cell having age. sheet1[”C6”]= ”AVERAGE(C3:C5)” Openpyxl module has Translate_formula() function to copy the formula across a range. Following program defines AVERAGE function in C6 and copies it to C7 that calculates average of marks. from openpyxl import load_workbook wb=load_workbook(”students.xlsx”) sheet1 = wb[”Student List”] from openpyxl.formula.translate import Translator#copy formula sheet1[”B6”]=”Average” sheet1[”C6”]=”=AVERAGE(C3:C5)” sheet1[”D6”] = Translator(”=AVERAGE(C3:C5)”, origin=”C6″).translate_formula(“D6″) wb.save(”students.xlsx”) The changed worksheet now appears as follows − Print Page Previous Next Advertisements ”;
Category: python Data Persistence
Python Data Persistence – Sqlite3 Module ”; Previous Next One major disadvantage of CSV, JSON, XML, etc., files is that they are not very useful for random access and transaction processing because they are largely unstructured in nature. Hence, it becomes very difficult to modify the contents. These flat files are not suitable for client-server environment as they lack asynchronous processing capability. Using unstructured data files leads to data redundancy and inconsistency. These problems can be overcome by using a relational database. A database is an organized collection of data to remove redundancy and inconsistency, and maintain data integrity. The relational database model is vastly popular. Its basic concept is to arrange data in entity table (called relation). The entity table structure provides one attribute whose value is unique for each row. Such an attribute is called ”primary key”. When primary key of one table appears in the structure of other tables, it is called ”Foreign key” and this forms the basis of the relationship between the two. Based on this model, there are many popular RDBMS products currently available − GadFly mSQL MySQL PostgreSQL Microsoft SQL Server 2000 Informix Interbase Oracle Sybase SQLite SQLite is a lightweight relational database used in a wide variety of applications. It is a self-contained, serverless, zero-configuration, transactional SQL database engine. The entire database is a single file, that can be placed anywhere in the file system. It”s an open-source software, with very small footprint, and zero configuration. It is popularly used in embedded devices, IOT and mobile apps. All relational databases use SQL for handling data in tables. However, earlier, each of these databases used to be connected with Python application with the help of Python module specific to the type of database. Hence, there was a lack of compatibility among them. If a user wanted to change to different database product, it would prove to be difficult. This incompatibility issue was addresses by raising ”Python Enhancement Proposal (PEP 248)” to recommend consistent interface to relational databases known as DB-API. Latest recommendations are called DB-API Version 2.0. (PEP 249) Python”s standard library consists of the sqlite3 module which is a DB-API compliant module for handling the SQLite database through Python program. This chapter explains Python”s connectivity with SQLite database. As mentioned earlier, Python has inbuilt support for SQLite database in the form of sqlite3 module. For other databases, respective DB-API compliant Python module will have to be installed with the help of pip utility. For example, to use MySQL database we need to install PyMySQL module. pip install pymysql Following steps are recommended in DB-API − Establish connection with the database using connect() function and obtain connection object. Call cursor() method of connection object to get cursor object. Form a query string made up of a SQL statement to be executed. Execute the desired query by invoking execute() method. Close the connection. import sqlite3 db=sqlite3.connect(”test.db”) Here, db is the connection object representing test.db. Note, that database will be created if it doesn’t exist already. The connection object db has following methods − Sr.No. Methods & Description 1 cursor(): Returns a Cursor object which uses this Connection. 2 commit(): Explicitly commits any pending transactions to the database. 3 rollback(): This optional method causes a transaction to be rolled back to the starting point. 4 close(): Closes the connection to the database permanently. A cursor acts as a handle for a given SQL query allowing the retrieval of one or more rows of the result. Cursor object is obtained from the connection to execute SQL queries using the following statement − cur=db.cursor() The cursor object has following methods defined − Sr.No Methods & Description 1 execute() Executes the SQL query in a string parameter. 2 executemany() Executes the SQL query using a set of parameters in the list of tuples. 3 fetchone() Fetches the next row from the query result set. 4 fetchall() Fetches all remaining rows from the query result set. 5 callproc() Calls a stored procedure. 6 close() Closes the cursor object. Following code creates a table in test.db:- import sqlite3 db=sqlite3.connect(”test.db”) cur =db.cursor() cur.execute(”””CREATE TABLE student ( StudentID INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT (20) NOT NULL, age INTEGER, marks REAL);”””) print (”table created successfully”) db.close() Data integrity desired in a database is achieved by commit() and rollback() methods of the connection object. The SQL query string may be having an incorrect SQL query that can raise an exception, which should be properly handled. For that, the execute() statement is placed within the try block If it is successful, the result is persistently saved using the commit() method. If the query fails, the transaction is undone using the rollback() method. Following code executes INSERT query on the student table in test.db. import sqlite3 db=sqlite3.connect(”test.db”) qry=”insert into student (name, age, marks) values(”Abbas”, 20, 80);” try: cur=db.cursor() cur.execute(qry) db.commit() print (“record added successfully”) except: print (“error in query”) db.rollback() db.close() If you want data in values clause of INSERT query to by dynamically provided by user input, use parameter substitution as recommended in Python DB-API. The ? character is used as a placeholder in the query string and provides the values in the form of a tuple in the execute()
Python Data Persistence – JSON Module ”; Previous Next JSON stands for JavaScript Object Notation. It is a lightweight data interchange format. It is a language-independent and cross platform text format, supported by many programming languages. This format is used for data exchange between the web server and clients. JSON format is similar to pickle. However, pickle serialization is Python specific whereas JSON format is implemented by many languages hence has become universal standard. Functionality and interface of json module in Python’s standard library is similar to pickle and marshal modules. Just as in pickle module, the json module also provides dumps() and loads() function for serialization of Python object into JSON encoded string, and dump() and load() functions write and read serialized Python objects to/from file. dumps() − This function converts the object into JSON format. loads() − This function converts a JSON string back to Python object. Following example demonstrates basic usage of these functions − import json data=[”Rakesh”,{”marks”:(50,60,70)}] s=json.dumps(data) json.loads(s) The dumps() function can take optional sort_keys argument. By default, it is False. If set to True, the dictionary keys appear in sorted order in the JSON string. The dumps() function has another optional parameter called indent which takes a number as value. It decides length of each segment of formatted representation of json string, similar to print output. The json module also has object oriented API corresponding to above functions. There are two classes defined in the module – JSONEncoder and JSONDecoder. JSONEncoder class Object of this class is encoder for Python data structures. Each Python data type is converted in corresponding JSON type as shown in following table − Python JSON Dict object list, tuple array Str string int, float, int- & float-derived Enums number True true False false None null The JSONEncoder class is instantiated by JSONEncoder() constructor. Following important methods are defined in encoder class − Sr.No. Methods & Description 1 encode() serializes Python object into JSON format 2 iterencode() Encodes the object and returns an iterator yielding encoded form of each item in the object. 3 indent Determines indent level of encoded string 4 sort_keys is either true or false to make keys appear in sorted order or not. 5 Check_circular if True, check for circular reference in container type object Following example encodes Python list object. e=json.JSONEncoder() e.encode(data) JSONDecoder class Object of this class helps in decoded in json string back to Python data structure. Main method in this class is decode(). Following example code retrieves Python list object from encoded string in earlier step. d=json.JSONDecoder() d.decode(s) The json module defines load() and dump() functions to write JSON data to a file like object – which may be a disk file or a byte stream and read data back from them. dump() This function writes JSONed Python object data to a file. The file must be opened with ‘w’ mode. import json data=[”Rakesh”, {”marks”: (50, 60, 70)}] fp=open(”json.txt”,”w”) json.dump(data,fp) fp.close() This code will create ‘json.txt’ in current directory. It shows the contents as follows − [“Rakesh”, {“marks”: [50, 60, 70]}] load() This function loads JSON data from the file and returns Python object from it. The file must be opened with read permission (should have ‘r’ mode). Example fp=open(”json.txt”,”r”) ret=json.load(fp) print (ret) fp.close() Output [”Rakesh”, {”marks”: [50, 60, 70]}] The json.tool module also has a command-line interface that validates data in file and prints JSON object in a pretty formatted manner. C:python37>python -m json.tool json.txt [ “Rakesh”, { “marks”: [ 50, 60, 70 ] } ] Print Page Previous Next Advertisements ”;
Python Data Persistence – Cassandra Driver ”; Previous Next Cassandra is another popular NoSQL database. High scalability, consistency, and fault-tolerance – these are some of the important features of Cassandra. This is Column store database. The data is stored across many commodity servers. As a result, data highly available. Cassandra is a product from Apache Software foundation. Data is stored in distributed manner across multiple nodes. Each node is a single server consisting of keyspaces. Fundamental building block of Cassandra database is keyspace which can be considered analogous to a database. Data in one node of Cassandra, is replicated in other nodes over a peer-to-peer network of nodes. That makes Cassandra a foolproof database. The network is called a data center. Multiple data centers may be interconnected to form a cluster. Nature of replication is configured by setting Replication strategy and replication factor at the time of the creation of a keyspace. One keyspace may have more than one Column families – just as one database may contain multiple tables. Cassandra’s keyspace doesn’t have a predefined schema. It is possible that each row in a Cassandra table may have columns with different names and in variable numbers. Cassandra software is also available in two versions: community and enterprise. The latest enterprise version of Cassandra is available for download at https://cassandra.apache.org/download/. Cassandra has its own query language called Cassandra Query Language (CQL). CQL queries can be executed from inside a CQLASH shell – similar to MySQL or SQLite shell. The CQL syntax appears similar to standard SQL. The Datastax community edition, also comes with a Develcenter IDE shown in following figure − Python module for working with Cassandra database is called Cassandra Driver. It is also developed by Apache foundation. This module contains an ORM API, as well as a core API similar in nature to DB-API for relational databases. Installation of Cassandra driver is easily done using pip utility. pip3 install cassandra-driver Interaction with Cassandra database, is done through Cluster object. Cassandra.cluster module defines Cluster class. We first need to declare Cluster object. from cassandra.cluster import Cluster clstr=Cluster() All transactions such as insert/update, etc., are performed by starting a session with a keyspace. session=clstr.connect() To create a new keyspace, use execute() method of session object. The execute() method takes a string argument which must be a query string. The CQL has CREATE KEYSPACE statement as follows. The complete code is as below − from cassandra.cluster import Cluster clstr=Cluster() session=clstr.connect() session.execute(“create keyspace mykeyspace with replication={ ”class”: ”SimpleStrategy”, ”replication_factor” : 3 };” Here, SimpleStrategy is a value for replication strategy and replication factor is set to 3. As mentioned earlier, a keyspace contains one or more tables. Each table is characterized by it data type. Python data types are automatically parsed with corresponding CQL data types according to following table − Python Type CQL Type None NULL Bool Boolean Float float, double int, long int, bigint, varint, smallint, tinyint, counter decimal.Decimal Decimal str, Unicode ascii, varchar, text buffer, bytearray Blob Date Date Datetime Timestamp Time Time list, tuple, generator List set, frozenset Set dict, OrderedDict Map uuid.UUID timeuuid, uuid To create a table, use session object to execute CQL query for creating a table. from cassandra.cluster import Cluster clstr=Cluster() session=clstr.connect(”mykeyspace”) qry= ””” create table students ( studentID int, name text, age int, marks int, primary key(studentID) );””” session.execute(qry) The keyspace so created can be further used to insert rows. The CQL version of INSERT query is similar to SQL Insert statement. Following code inserts a row in students table. from cassandra.cluster import Cluster clstr=Cluster() session=clstr.connect(”mykeyspace”) session.execute(“insert into students (studentID, name, age, marks) values (1, ”Juhi”,20, 200);” As you would expect, SELECT statement is also used with Cassandra. In case of execute() method containing SELECT query string, it returns a result set object which can be traversed using a loop. from cassandra.cluster import Cluster clstr=Cluster() session=clstr.connect(”mykeyspace”) rows=session.execute(“select * from students;”) for row in rows: print (StudentID: {} Name:{} Age:{} price:{} Marks:{}” .format(row[0],row[1], row[2], row[3])) Cassandra’s SELECT query supports use of WHERE clause to apply filter on result set to be fetched. Traditional logical operators like <, > == etc. are recognized. To retrieve, only those rows from students table for names with age>20, the query string in execute() method should be as follows − rows=session.execute(“select * from students WHERE age>20 allow filtering;”) Note, the use of ALLOW FILTERING. The ALLOW FILTERING part of this statement allows to explicitly allow (some) queries that require filtering. Cassandra driver API defines following classes of Statement type in its cassendra.query module. SimpleStatement A simple, unprepared CQL query contained in a query string. All examples above are examples of SimpleStatement. BatchStatement Multiple queries (such as INSERT, UPDATE, and DELETE) are put in a batch and executed at once. Each row is first converted as a SimpleStatement and then added in a batch. Let us put rows to be added in Students table in the form of list of tuples as follows − studentlist=[(1,”Juhi”,20,100), (”2,”dilip”,20, 110),(3,”jeevan”,24,145)] To add above rows using BathStatement, run following script − from cassandra.query import SimpleStatement, BatchStatement batch=BatchStatement() for student in studentlist: batch.add(SimpleStatement(“INSERT INTO students (studentID, name, age, marks) VALUES (%s, %s, %s %s)”), (student[0], student[1],student[2], student[3])) session.execute(batch) PreparedStatement Prepared statement is like a parameterized query in DB-API. Its query string is saved by Cassandra for
Python Data Persistence – Quick Guide ”; Previous Next Python Data Persistence – Introduction Overview of Python – Data Persistence During the course of using any software application, user provides some data to be processed. The data may be input, using a standard input device (keyboard) or other devices such as disk file, scanner, camera, network cable, WiFi connection, etc. Data so received, is stored in computer’s main memory (RAM) in the form of various data structures such as, variables and objects until the application is running. Thereafter, memory contents from RAM are erased. However, more often than not, it is desired that the values of variables and/or objects be stored in such a manner, that it can be retrieved whenever required, instead of again inputting the same data. The word ‘persistence’ means “the continuance of an effect after its cause is removed”. The term data persistence means it continues to exist even after the application has ended. Thus, data stored in a non-volatile storage medium such as, a disk file is a persistent data storage. In this tutorial, we will explore various built-in and third party Python modules to store and retrieve data to/from various formats such as text file, CSV, JSON and XML files as well as relational and non-relational databases. Using Python’s built-in File object, it is possible to write string data to a disk file and read from it. Python’s standard library, provides modules to store and retrieve serialized data in various data structures such as JSON and XML. Python’s DB-API provides a standard way of interacting with relational databases. Other third party Python packages, present interfacing functionality with NOSQL databases such as MongoDB and Cassandra. This tutorial also introduces, ZODB database which is a persistence API for Python objects. Microsoft Excel format is a very popular data file format. In this tutorial, we will learn how to handle .xlsx file through Python. Python Data Persistence – File API Python uses built-in input() and print() functions to perform standard input/output operations. The input() function reads bytes from a standard input stream device, i.e. keyboard. The print() function on the other hand, sends the data towards standard output stream device i.e. the display monitor. Python program interacts with these IO devices through standard stream objects stdin and stdout defined in sys module. The input() function is actually a wrapper around readline() method of sys.stdin object. All keystrokes from the input stream are received till ‘Enter’ key is pressed. >>> import sys >>> x=sys.stdin.readline() Welcome to TutorialsPoint >>> x ”Welcome to TutorialsPointn” Note that, readline() function leave a trailing ‘n’ character. There is also a read() method which reads data from standard input stream till it is terminated by Ctrl+D character. >>> x=sys.stdin.read() Hello Welcome to TutorialsPoint >>> x ”HellonWelcome to TutorialsPointn” Similarly, print() is a convenience function emulating write() method of stdout object. >>> x=”Welcome to TutorialsPointn” >>> sys.stdout.write(x) Welcome to TutorialsPoint 26 Just as stdin and stdout predefined stream objects, a Python program can read data from and send data to a disk file or a network socket. They are also streams. Any object that has read() method is an input stream. Any object that has write() method is an output stream. The communication with the stream is established by obtaining reference to the stream object with built-in open() function. open() function This built-in function uses following arguments − f=open(name, mode, buffering) The name parameter, is name of disk file or byte string, mode is optional one-character string to specify the type of operation to be performed (read, write, append etc.) and buffering parameter is either 0, 1 or -1 indicating buffering is off, on or system default. File opening mode is enumerated as per table below. Default mode is ‘r’ Sr.No Parameters & Description 1 R Open for reading (default) 2 W Open for writing, truncating the file first 3 X Create a new file and open it for writing 4 A Open for writing, appending to the end of the file if it exists 5 B Binary mode 6 T Text mode (default) 7 + Open a disk file for updating (reading and writing) In order to save data to file it must be opened with ‘w’ mode. f=open(”test.txt”,”w”) This file object acts as an output stream, and has access to write() method. The write() method sends a string to this object, and is stored in the file underlying it. string=”Hello TutorialsPointn” f.write(string) It is important to close the stream, to ensure that any data remaining in buffer is completely transferred to it. file.close() Try and open ‘test.txt’ using any test editor (such as notepad) to confirm successful creation of file. To read contents of ‘test.txt’ programmatically, it must be opened in ‘r’ mode. f=open(”test.txt”,”r”) This object behaves as an input stream. Python can fetch data from the stream using read() method. string=f.read() print (string) Contents of the file are displayed on Python console. The File object also supports readline() method which is able to read string till it encounters EOF character. However, if same file is opened in ‘w’ mode to store additional text in it, earlier contents are erased. Whenever, a file is opened with write permission, it is treated as if it is a new file. To add data to an existing file, use ‘a’ for append mode. f=open(”test.txt”,”a”) f.write(”Python Tutorialsn”)
Data Persistence – ZODB
Data Persistence – ZODB ”; Previous Next ZODB (Zope object Database) is database for storing Python objects. It is ACID compliant – feature not found in NOSQL databases. The ZODB is also open source, horizontally scalable and schema-free, like many NoSQL databases. However, it is not distributed and does not offer easy replication. It provides persistence mechanism for Python objects. It is a part of Zope Application server, but can also be independently used. ZODB was created by Jim Fulton of Zope Corporation. It started as simple Persistent Object System. Its current version is 5.5.0 and is written completely in Python. using an extended version of Python”s built-in object persistence (pickle). Some of the main features of ZODB are − transactions history/undo transparently pluggable storage built-in caching multiversion concurrency control (MVCC) scalability across a network The ZODB is a hierarchical database. There is a root object, initialized when a database is created. The root object is used like a Python dictionary and it can contain other objects (which can be dictionary-like themselves). To store an object in the database, it’s enough to assign it to a new key inside its container. ZODB is useful for applications where data is hierarchical and there are likely to be more reads than writes. ZODB is an extension of pickle object. That”s why it can be processed through Python script only. To install latest version of ZODB let use pip utility − pip install zodb Following dependencies are also installed − BTrees==4.6.1 cffi==1.13.2 persistent==4.5.1 pycparser==2.19 six==1.13.0 transaction==2.4.0 ZODB provides following storage options − FileStorage This is the default. Everything stored in one big Data.fs file, which is essentially a transaction log. DirectoryStorage This stores one file per object revision. In this case, it does not require the Data.fs.index to be rebuilt on an unclean shutdown. RelStorage This stores pickles in a relational database. PostgreSQL, MySQL and Oracle are supported. To create ZODB database we need a storage, a database and finally a connection. First step is to have storage object. import ZODB, ZODB.FileStorage storage = ZODB.FileStorage.FileStorage(”mydata.fs”) DB class uses this storage object to obtain database object. db = ZODB.DB(storage) Pass None to DB constructor to create in-memory database. Db=ZODB.DB(None) Finally, we establish connection with the database. conn=db.open() The connection object then gives you access to the ‘root’ of the database with the ‘root()’ method. The ‘root’ object is the dictionary that holds all of your persistent objects. root = conn.root() For example, we add a list of students to the root object as follows − root[”students”] = [”Mary”, ”Maya”, ”Meet”] This change is not permanently saved in the database till we commit the transaction. import transaction transaction.commit() To store object of a user defined class, the class must be inherited from persistent.Persistent parent class. Advantages of Subclassing Subclassing Persistent class has its advantages as follows − The database will automatically track object changes made by setting attributes. Data will be saved in its own database record. You can save data that doesn’t subclass Persistent, but it will be stored in the database record of whatever persistent object references it. Non-persistent objects are owned by their containing persistent object and if multiple persistent objects refer to the same non-persistent subobject, they’ll get their own copies. Let use define a student class subclassing Persistent class as under − import persistent class student(persistent.Persistent): def __init__(self, name): self.name = name def __repr__(self): return str(self.name) To add object of this class, let us first set up the connection as described above. import ZODB, ZODB.FileStorage storage = ZODB.FileStorage.FileStorage(”studentdata.fs”) db = ZODB.DB(storage) conn=db.open() root = conn.root() Declare object an add to root and then commit the transaction s1=student(“Akash”) root[”s1”]=s1 import transaction transaction.commit() conn.close() List of all objects added to root can be retrieved as a view object with the help of items() method since root object is similar to built in dictionary. print (root.items()) ItemsView({”s1”: Akash}) To fetch attribute of specific object from root, print (root[”s1”].name) Akash The object can be easily updated. Since the ZODB API is a pure Python package, it doesn’t require any external SQL type language to be used. root[”s1”].name=”Abhishek” import transaction transaction.commit() The database will be updated instantly. Note that transaction class also defines abort() function which is similar to rollback() transaction control in SQL. Print Page Previous Next Advertisements ”;
Python Data Persistence – Plistlib Module ”; Previous Next The plist format is mainly used by MAC OS X. These files are basically XML documents. They store and retrieve properties of an object. Python library contains plist module, that is used to read and write ”property list” files (they usually have .plist” extension). The plistlib module is more or less similar to other serialization libraries in the sense, it also provides dumps() and loads() functions for string representation of Python objects and load() and dump() functions for disk operation. Following dictionary object maintains property (key) and corresponding value − proplist = { “name” : “Ganesh”, “designation”:”manager”, “dept”:”accts”, “salary” : {“basic”:12000, “da”:4000, “hra”:800} } In order to write these properties in a disk file, we call dump() function in plist module. import plistlib fileName=open(”salary.plist”,”wb”) plistlib.dump(proplist, fileName) fileName.close() Conversely, to read back the property values, use load() function as follows − fp= open(”salary.plist”, ”rb”) pl = plistlib.load(fp) print(pl) Print Page Previous Next Advertisements ”;
Python Data Persistence – SQLAlchemy ”; Previous Next Any relational database holds data in tables. The table structure defines data type of attributes which are basically of primary data types only which are mapped to corresponding built-in data types of Python. However, Python”s user-defined objects can”t be persistently stored and retrieved to/from SQL tables. This is a disparity between SQL types and object oriented programming languages such as Python. SQL doesn”t have equivalent data type for others such as dict, tuple, list, or any user defined class. If you have to store an object in a relational database, it”s instance attributes should be deconstructed into SQL data types first, before executing INSERT query. On the other hand, data retrieved from a SQL table is in primary types. A Python object of desired type will have to be constructed by using for use in Python script. This is where Object Relational Mappers are useful. Object Relation Mapper (ORM) An Object Relation Mapper (ORM) is an interface between a class and a SQL table. A Python class is mapped to a certain table in database, so that conversion between object and SQL types is automatically performed. The Students class written in Python code is mapped to Students table in the database. As a result, all CRUD operations are done by calling respective methods of the class. This eliminates need to execute hard coded SQL queries in Python script. ORM library thus acts as an abstraction layer over the raw SQL queries and can be of help in rapid application development. SQLAlchemy is a popular object relational mapper for Python. Any manipulation of state of model object is synchronized with its related row in the database table. SQLALchemy library includes ORM API and SQL Expression Language (SQLAlchemy Core). Expression language executes primitive constructs of the relational database directly. ORM is a high level and abstracted pattern of usage constructed on top of the SQL Expression Language. It can be said that ORM is an applied usage of the Expression Language. We shall discuss SQLAlchemy ORM API and use SQLite database in this topic. SQLAlchemy communicates with various types of databases through their respective DBAPI implementations using a dialect system. All dialects require that an appropriate DBAPI driver is installed. Dialects for following type of databases are included − Firebird Microsoft SQL Server MySQL Oracle PostgreSQL SQLite Sybase Installation of SQLAlchemy is easy and straightforward, using pip utility. pip install sqlalchemy To check if SQLalchemy is properly installed and its version, enter following on Python prompt − >>> import sqlalchemy >>>sqlalchemy.__version__ ”1.3.11” Interactions with database are done through Engine object obtained as a return value of create_engine() function. engine =create_engine(”sqlite:///mydb.sqlite”) SQLite allows creation of in-memory database. SQLAlchemy engine for in-memory database is created as follows − from sqlalchemy import create_engine engine=create_engine(”sqlite:///:memory:”) If you intend to use MySQL database instead, use its DB-API module – pymysql and respective dialect driver. engine = create_engine(”mysql+pymydsql://root@localhost/mydb”) The create_engine has an optional echo argument. If set to true, the SQL queries generated by engine will be echoed on the terminal. SQLAlchemy contains declarative base class. It acts as a catalog of model classes and mapped tables. from sqlalchemy.ext.declarative import declarative_base base=declarative_base() Next step is to define a model class. It must be derived from base – object of declarative_base class as above. Set __tablename__ property to name of the table you want to be created in the database. Other attributes correspond to the fields. Each one is a Column object in SQLAlchemy and its data type is from one of the list below − BigInteger Boolean Date DateTime Float Integer Numeric SmallInteger String Text Time Following code is the model class named as Student that is mapped to Students table. #myclasses.py from sqlalchemy.ext.declarative import declarative_base from sqlalchemy import Column, Integer, String, Numeric base=declarative_base() class Student(base): __tablename__=”Students” StudentID=Column(Integer, primary_key=True) name=Column(String) age=Column(Integer) marks=Column(Numeric) To create a Students table that has a corresponding structure, execute create_all() method defined for base class. base.metadata.create_all(engine) We now have to declare an object of our Student class. All database transactions such as add, delete or retrieve data from database, etc., are handled by a Session object. from sqlalchemy.orm import sessionmaker Session = sessionmaker(bind=engine) sessionobj = Session() Data stored in Student object is physically added in underlying table by session’s add() method. s1 = Student(name=”Juhi”, age=25, marks=200) sessionobj.add(s1) sessionobj.commit() Here, is the entire code for adding record in students table. As it is executed, corresponding SQL statement log is displayed on console. from sqlalchemy import Column, Integer, String from sqlalchemy import create_engine from myclasses import Student, base engine = create_engine(”sqlite:///college.db”, echo=True) base.metadata.create_all(engine) from sqlalchemy.orm import sessionmaker Session = sessionmaker(bind=engine) sessionobj = Session() s1 = Student(name=”Juhi”, age=25, marks=200) sessionobj.add(s1) sessionobj.commit() Console output CREATE TABLE “Students” ( “StudentID” INTEGER NOT NULL, name VARCHAR, age INTEGER, marks NUMERIC, PRIMARY KEY (“StudentID”) ) INFO sqlalchemy.engine.base.Engine () INFO sqlalchemy.engine.base.Engine COMMIT INFO sqlalchemy.engine.base.Engine BEGIN (implicit) INFO sqlalchemy.engine.base.Engine INSERT INTO “Students” (name, age, marks) VALUES (?, ?, ?) INFO sqlalchemy.engine.base.Engine (”Juhi”, 25, 200.0) INFO sqlalchemy.engine.base.Engine COMMIT The session object also provides add_all() method to insert more than one objects in a single transaction. sessionobj.add_all([s2,s3,s4,s5]) sessionobj.commit() Now that, records are added in the table, we would like to fetch from it just as SELECT query does. The session object has query() method to perform the task. Query object is returned by query() method on our Student model. qry=seesionobj.query(Student) Use the get() method of this Query object fetches object corresponding to given primary key. S1=qry.get(1) While
Python Data Persistence – Pickle Module ”; Previous Next Python’s terminology for serialization and deserialization is pickling and unpickling respectively. The pickle module in Python library, uses very Python specific data format. Hence, non-Python applications may not be able to deserialize pickled data properly. It is also advised not to unpickle data from un-authenticated source. The serialized (pickled) data can be stored in a byte string or a binary file. This module defines dumps() and loads() functions to pickle and unpickle data using byte string. For file based process, the module has dump() and load() function. Python’s pickle protocols are the conventions used in constructing and deconstructing Python objects to/from binary data. Currently, pickle module defines 5 different protocols as listed below − Sr.No. Names & Description 1 Protocol version 0 Original “human-readable” protocol backwards compatible with earlier versions. 2 Protocol version 1 Old binary format also compatible with earlier versions of Python. 3 Protocol version 2 Introduced in Python 2.3 provides efficient pickling of new-style classes. 4 Protocol version 3 Added in Python 3.0. recommended when compatibility with other Python 3 versions is required. 5 Protocol version 4 was added in Python 3.4. It adds support for very large objects Example The pickle module consists of dumps() function that returns a string representation of pickled data. from pickle import dump dct={“name”:”Ravi”, “age”:23, “Gender”:”M”,”marks”:75} dctstring=dumps(dct) print (dctstring) Output b”x80x03}qx00(Xx04x00x00x00nameqx01Xx04x00x00x00Raviqx02Xx03x00x00x00ageqx03Kx17Xx06x00x00x00Genderqx04Xx01x00x00x00Mqx05Xx05x00x00x00marksqx06KKu. Example Use loads() function, to unpickle the string and obtain original dictionary object. from pickle import load dct=loads(dctstring) print (dct) Output {”name”: ”Ravi”, ”age”: 23, ”Gender”: ”M”, ”marks”: 75} Pickled objects can also be persistently stored in a disk file, using dump() function and retrieved using load() function. import pickle f=open(“data.txt”,”wb”) dct={“name”:”Ravi”, “age”:23, “Gender”:”M”,”marks”:75} pickle.dump(dct,f) f.close() #to read import pickle f=open(“data.txt”,”rb”) d=pickle.load(f) print (d) f.close() The pickle module also provides, object oriented API for serialization mechanism in the form of Pickler and Unpickler classes. As mentioned above, just as built-in objects in Python, objects of user defined classes can also be persistently serialized in disk file. In following program, we define a User class with name and mobile number as its instance attributes. In addition to the __init__() constructor, the class overrides __str__() method that returns a string representation of its object. class User: def __init__(self,name, mob): self.name=name self.mobile=mob def __str__(self): return (”Name: {} mobile: {} ”. format(self.name, self.mobile)) To pickle object of above class in a file we use pickler class and its dump()method. from pickle import Pickler user1=User(”Rajani”, ”[email protected]”, ”1234567890”) file=open(”userdata”,”wb”) Pickler(file).dump(user1) Pickler(file).dump(user2) file.close() Conversely, Unpickler class has load() method to retrieve serialized object as follows − from pickle import Unpickler file=open(”usersdata”,”rb”) user1=Unpickler(file).load() print (user1) Print Page Previous Next Advertisements ”;
Python Data Persistence – Useful Resources ”; Previous Next The following resources contain additional information on Python Data Persistence. Please use them to get more in-depth knowledge on this. Useful Video Courses Python Data Science Course With Numpy, Pandas, and Matplotlib Best Seller 64 Lectures 6 hours Abhilash Nelson More Detail Data Science, Machine Learning, Data Analysis, Python & R 15 Lectures 8 hours DATAhill Solutions Srinivas Reddy More Detail Python Data Types Course – Fundamentals of Programming in Python 11 Lectures 1 hours Frahaan Hussain More Detail Python Course For Data Science and Machine Learning Featured 141 Lectures 22.5 hours Juan Galvan More Detail Python Data Analysis with Pandas 40 Lectures 2 hours Fanuel Mapuwei More Detail Master Python Data Analysis and Modelling Essentials 35 Lectures 7 hours Shouke Wei More Detail Print Page Previous Next Advertisements ”;