Concurrency in Python – Useful Resources ”; Previous Next The following resources contain additional information on Concurrency in Python. Please use them to get more in-depth knowledge on this topic. Useful Video Courses Python Flask and SQLAlchemy ORM 22 Lectures 1.5 hours Jack Chan More Detail Python and Elixir Programming Bundle Course 81 Lectures 9.5 hours Pranjal Srivastava More Detail TKinter Course – Build Python GUI Apps 49 Lectures 4 hours John Elder More Detail A Beginner”s Guide to Python and Data Science 81 Lectures 8.5 hours Datai Team Academy More Detail Deploy Face Recognition Project With Python, Django, And Machine Learning Best Seller 93 Lectures 6.5 hours Srikanth Guskra More Detail Professional Python Web Development with Flask 80 Lectures 12 hours Stone River ELearning More Detail Print Page Previous Next Advertisements ”;
Category: concurrency In Python
Testing Thread Applications
Testing Thread Applications ”; Previous Next In this chapter, we will learn about testing of thread applications. We will also learn the importance of testing. Why to Test? Before we dive into the discussion about the importance of testing, we need to know what is testing. In general terms, testing is a technique of finding out how well something is working. On the other hand, specifically if we talk about computer programs or software then testing is the technique of accessing the functionality of a software program. In this section, we will discuss the importance of software testing. In software development, there must be double-checking before the releasing of software to the client. That is why it is very important to test the software by experienced testing team. Consider the following points to understand the importance of software testing − Improvement of software quality Certainly, no company wants to deliver low quality software and no client wants to buy low quality software. Testing improves the quality of software by finding and fixing the bugs in that. Satisfaction of customers The most important part of any business is the satisfaction of their customers. By providing bug free and good quality software, the companies can achieve customer satisfaction. Lessen the impact of new features Suppose we have made a software system of 10000 lines and we need to add a new feature then the development team would have the concern about the impact of this new feature on whole software. Here, also, testing plays a vital role because if the testing team has made a good suite of tests then it can save us from any potential catastrophic breaks. User experience Another most important part of any business is the experience of the users of that product. Only testing can assure that the end user finds it simple and easy to use the product. Cutting down the expenses Testing can cut down the total cost of software by finding and fixing the bugs in testing phase of its development rather than fixing it after delivery. If there is a major bug after the delivery of the software then it would increase its tangible cost say in terms of expenses and intangible cost say in terms of customer dissatisfaction, company’s negative reputation etc. What to Test? It is always recommended to have appropriate knowledge of what is to be tested. In this section, we will first understand be the prime motive of tester while testing any software. Code coverage, i.e., how many lines of code our test suite hits, while testing, should be avoided. It is because, while testing, focusing only on the number of lines of codes adds no real value to our system. There may remain some bugs, which reflect later at a later stage even after deployment. Consider the following important points related to what to test − We need to focus on testing the functionality of the code rather than the code coverage. We need to test the most important parts of the code first and then move towards the less important parts of the code. It will definitely save time. The tester must have multitude different tests that can push the software up to its limits. Approaches for testing concurrent software programs Due to the capability of utilizing the true capability of multi-core architecture, concurrent software systems are replacing sequential systems. In recent times, concurrent system programs are being used in everything from mobile phones to washing machines, from cars to airplanes, etc. We need to be more careful about testing the concurrent software programs because if we have added multiple threads to single thread application having already a bug, then we would end up with multiple bugs. Testing techniques for concurrent software programs are extensively focusing on selecting interleaving that expose potentially harmful patterns like race conditions, deadlocks and violation of atomicity. Following are two approaches for testing concurrent software programs − Systematic exploration This approach aims to explore the space of the interleavings as broadly as possible. Such approaches can adopt a brute-force technique and others adopt partial order reduction technique or heuristic technique to explore the space of interleavings. Property-driven Property-driven approaches rely on the observation that concurrency faults are more likely to occur under interleavings that expose specific properties such as suspicious memory access pattern. Different property-driven approaches target different faults like race conditions, deadlocks and violation of atomicity, which further depends on one or other specific properties. Testing Strategies Test Strategy is also known as test approach. The strategy defines how testing would be carried out. Test approach has two techniques − Proactive An approach in which the test design process is initiated as early as possible in order to find and fix the defects before the build is created. Reactive An approach in which the testing does not start until the completion of the development process. Before applying any test strategy or approach on python program, we must have a basic idea about the kind of errors a software program may have. The errors are as follows − Syntactical errors During program development, there can be many small errors. The errors are mostly due to typing mistakes. For example, missing colon or a wrong spelling of a keyword, etc. Such errors are due to the mistake in program syntax and not in logic. Hence, these errors are called syntactical errors. Semantic errors The semantic errors are also called logical errors. If there is a logical or semantic error in software program then the statement will compile and run correctly but it will not give the desired output because the logic is not correct. Unit Testing This is one of the most used testing strategies for testing python programs. This strategy is used for testing units or components of the code. By units or components, we mean classes or functions of the code. Unit testing simplifies the testing of large programming systems by testing “small” units. With the help
Pool of Processes
Concurrency in Python – Pool of Processes ”; Previous Next Pool of process can be created and used in the same way as we have created and used the pool of threads. Process pool can be defined as the group of pre-instantiated and idle processes, which stand ready to be given work. Creating process pool is preferred over instantiating new processes for every task when we need to do a large number of tasks. Python Module – Concurrent.futures Python standard library has a module called the concurrent.futures. This module was added in Python 3.2 for providing the developers a high-level interface for launching asynchronous tasks. It is an abstraction layer on the top of Python’s threading and multiprocessing modules for providing the interface for running the tasks using pool of thread or processes. In our subsequent sections, we will look at the different subclasses of the concurrent.futures module. Executor Class Executor is an abstract class of the concurrent.futures Python module. It cannot be used directly and we need to use one of the following concrete subclasses − ThreadPoolExecutor ProcessPoolExecutor ProcessPoolExecutor – A concrete subclass It is one of the concrete subclasses of the Executor class. It uses multi-processing and we get a pool of processes for submitting the tasks. This pool assigns tasks to the available processes and schedule them to run. How to create a ProcessPoolExecutor? With the help of the concurrent.futures module and its concrete subclass Executor, we can easily create a pool of process. For this, we need to construct a ProcessPoolExecutor with the number of processes we want in the pool. By default, the number is 5. This is followed by submitting a task to the process pool. Example We will now consider the same example that we used while creating thread pool, the only difference being that now we will use ProcessPoolExecutor instead of ThreadPoolExecutor . from concurrent.futures import ProcessPoolExecutor from time import sleep def task(message): sleep(2) return message def main(): executor = ProcessPoolExecutor(5) future = executor.submit(task, (“Completed”)) print(future.done()) sleep(2) print(future.done()) print(future.result()) if __name__ == ”__main__”: main() Output False False Completed In the above example, a ProcessPoolExecutor has been constructed with 5 threads. Then a task, which will wait for 2 seconds before giving the message, is submitted to the process pool executor. As seen from the output, the task does not complete until 2 seconds, so the first call to done() will return False. After 2 seconds, the task is done and we get the result of the future by calling the result() method on it. Instantiating ProcessPoolExecutor – Context Manager Another way to instantiate ProcessPoolExecutor is with the help of context manager. It works similar to the method used in the above example. The main advantage of using context manager is that it looks syntactically good. The instantiation can be done with the help of the following code − with ProcessPoolExecutor(max_workers = 5) as executor Example For better understanding, we are taking the same example as used while creating thread pool. In this example, we need to start by importing the concurrent.futures module. Then a function named load_url() is created which will load the requested url. The ProcessPoolExecutor is then created with the 5 number of threads in the pool. The ProcessPoolExecutor has been utilized as context manager. We can get the result of the future by calling the result() method on it. import concurrent.futures from concurrent.futures import ProcessPoolExecutor import urllib.request URLS = [”http://www.foxnews.com/”, ”http://www.cnn.com/”, ”http://europe.wsj.com/”, ”http://www.bbc.co.uk/”, ”http://some-made-up-domain.com/”] def load_url(url, timeout): with urllib.request.urlopen(url, timeout = timeout) as conn: return conn.read() def main(): with concurrent.futures.ProcessPoolExecutor(max_workers=5) as executor: future_to_url = {executor.submit(load_url, url, 60): url for url in URLS} for future in concurrent.futures.as_completed(future_to_url): url = future_to_url[future] try: data = future.result() except Exception as exc: print(”%r generated an exception: %s” % (url, exc)) else: print(”%r page is %d bytes” % (url, len(data))) if __name__ == ”__main__”: main() Output The above Python script will generate the following output − ”http://some-made-up-domain.com/” generated an exception: <urlopen error [Errno 11004] getaddrinfo failed> ”http://www.foxnews.com/” page is 229476 bytes ”http://www.cnn.com/” page is 165323 bytes ”http://www.bbc.co.uk/” page is 284981 bytes ”http://europe.wsj.com/” page is 967575 bytes Use of the Executor.map() function The Python map() function is widely used to perform a number of tasks. One such task is to apply a certain function to every element within iterables. Similarly, we can map all the elements of an iterator to a function and submit these as independent jobs to the ProcessPoolExecutor. Consider the following example of Python script to understand this. Example We will consider the same example that we used while creating thread pool using the Executor.map() function. In the example givenbelow, the map function is used to apply square() function to every value in the values array. from concurrent.futures import ProcessPoolExecutor from concurrent.futures import as_completed values = [2,3,4,5] def square(n): return n * n def main(): with ProcessPoolExecutor(max_workers = 3) as executor: results = executor.map(square, values) for result in results: print(result) if __name__ == ”__main__”: main() Output The above Python script will generate the following output 4 9 16 25 When to use ProcessPoolExecutor and ThreadPoolExecutor? Now that we have studied about both the Executor classes – ThreadPoolExecutor and ProcessPoolExecutor, we need to know when to use which executor. We need to choose ProcessPoolExecutor in case of CPU-bound workloads and ThreadPoolExecutor in case of I/O-bound workloads. If we use ProcessPoolExecutor, then we do not need to worry about GIL because it uses multiprocessing. Moreover, the execution time will be less when compared to ThreadPoolExecution. Consider the following Python script example to understand this. Example import time import concurrent.futures value = [8000000, 7000000] def counting(n): start = time.time() while n > 0: n -= 1 return time.time() – start def main(): start = time.time() with concurrent.futures.ProcessPoolExecutor() as executor: for number, time_taken in zip(value, executor.map(counting, value)): print(”Start: {} Time taken: {}”.format(number, time_taken)) print(”Total time taken: {}”.format(time.time() – start)) if __name__ == ”__main__”: main() Output Start: 8000000 Time taken: 1.5509998798370361 Start: 7000000 Time taken: 1.3259999752044678 Total time taken: 2.0840001106262207 Example- Python script with
Discussion
Discuss Concurrency in Python ”; Previous Next Concurrency, natural phenomena, is the happening of two or more events at the same time. It is a challenging task for the professionals to create concurrent applications and get the most out of computer hardware. Print Page Previous Next Advertisements ”;
Quick Guide
Concurrency in Python – Quick Guide ”; Previous Next Concurrency in Python – Introduction In this chapter, we will understand the concept of concurrency in Python and learn about the different threads and processes. What is Concurrency? In simple words, concurrency is the occurrence of two or more events at the same time. Concurrency is a natural phenomenon because many events occur simultaneously at any given time. In terms of programming, concurrency is when two tasks overlap in execution. With concurrent programming, the performance of our applications and software systems can be improved because we can concurrently deal with the requests rather than waiting for a previous one to be completed. Historical Review of Concurrency Following points will give us the brief historical review of concurrency − From the concept of railroads Concurrency is closely related with the concept of railroads. With the railroads, there was a need to handle multiple trains on the same railroad system in such a way that every train would get to its destination safely. Concurrent computing in academia The interest in computer science concurrency began with the research paper published by Edsger W. Dijkstra in 1965. In this paper, he identified and solved the problem of mutual exclusion, the property of concurrency control. High-level concurrency primitives In recent times, programmers are getting improved concurrent solutions because of the introduction of high-level concurrency primitives. Improved concurrency with programming languages Programming languages such as Google’s Golang, Rust and Python have made incredible developments in areas which help us get better concurrent solutions. What is thread & multithreading? Thread is the smallest unit of execution that can be performed in an operating system. It is not itself a program but runs within a program. In other words, threads are not independent of one other. Each thread shares code section, data section, etc. with other threads. They are also known as lightweight processes. A thread consists of the following components − Program counter which consist of the address of the next executable instruction Stack Set of registers A unique id Multithreading, on the other hand, is the ability of a CPU to manage the use of operating system by executing multiple threads concurrently. The main idea of multithreading is to achieve parallelism by dividing a process into multiple threads. The concept of multithreading can be understood with the help of the following example. Example Suppose we are running a particular process wherein we open MS Word to type content into it. One thread will be assigned to open MS Word and another thread will be required to type content in it. And now, if we want to edit the existing then another thread will be required to do the editing task and so on. What is process & multiprocessing? Aprocessis defined as an entity, which represents the basic unit of work to be implemented in the system. To put it in simple terms, we write our computer programs in a text file and when we execute this program, it becomes a process that performs all the tasks mentioned in the program. During the process life cycle, it passes through different stages – Start, Ready, Running, Waiting and Terminating. Following diagram shows the different stages of a process − A process can have only one thread, called primary thread, or multiple threads having their own set of registers, program counter and stack. Following diagram will show us the difference − Multiprocessing, on the other hand, is the use of two or more CPUs units within a single computer system. Our primary goal is to get the full potential from our hardware. To achieve this, we need to utilize full number of CPU cores available in our computer system. Multiprocessing is the best approach to do so. Python is one of the most popular programming languages. Followings are some reasons that make it suitable for concurrent applications − Syntactic sugar Syntactic sugar is syntax within a programming language that is designed to make things easier to read or to express. It makes the language “sweeter” for human use: things can be expressed more clearly, more concisely, or in an alternative style based on preference. Python comes with Magic methods, which can be defined to act on objects. These Magic methods are used as syntactic sugar and bound to more easy-to-understand keywords. Large Community Python language has witnessed a massive adoption rate amongst data scientists and mathematicians, working in the field of AI, machine learning, deep learning and quantitative analysis. Useful APIs for concurrent programming Python 2 and 3 have large number of APIs dedicated for parallel/concurrent programming. Most popular of them are threading, concurrent.features, multiprocessing, asyncio, gevent and greenlets, etc. Limitations of Python in implementing concurrent applications Python comes with a limitation for concurrent applications. This limitation is called GIL (Global Interpreter Lock) is present within Python. GIL never allows us to utilize multiple cores of CPU and hence we can say that there are no true threads in Python. We can understand the concept of GIL as follows − GIL (Global Interpreter Lock) It is one of the most controversial topics in the Python world. In CPython, GIL is the mutex – the mutual exclusion lock, which makes things thread safe. In other words, we can say that GIL prevents multiple threads from executing Python code in parallel. The lock can be held by only one thread at a time and if we want to execute a thread then it must acquire the lock first. The diagram shown below will help you understand the working of GIL. However, there are some libraries and implementations in Python such as Numpy, Jpython and IronPytbhon. These libraries work without any interaction with GIL. Concurrency vs Parallelism Both concurrency and parallelism are used in relation to multithreaded programs but there is a lot of confusion about the similarity and difference between them. The big question in this regard: is concurrency parallelism or not? Although both the terms appear quite
System & Memory Architecture
System and Memory Architecture ”; Previous Next There are different system and memory architecture styles that need to be considered while designing the program or concurrent system. It is very necessary because one system & memory style may be suitable for one task but may be error prone to other task. Computer system architectures supporting concurrency Michael Flynn in 1972 gave taxonomy for categorizing different styles of computer system architecture. This taxonomy defines four different styles as follows − Single instruction stream, single data stream (SISD) Single instruction stream, multiple data stream (SIMD) Multiple instruction stream, single data stream (MISD) Multiple instruction stream, multiple data stream (MIMD). Single instruction stream, single data stream (SISD) As the name suggests, such kind of systems would have one sequential incoming data stream and one single processing unit to execute the data stream. They are just like uniprocessor systems having parallel computing architecture. Following is the architecture of SISD − Advantages of SISD The advantages of SISD architecture are as follows − It requires less power. There is no issue of complex communication protocol between multiple cores. Disadvantages of SISD The disadvantages of SISD architecture are as follows − The speed of SISD architecture is limited just like single-core processors. It is not suitable for larger applications. Single instruction stream, multiple data stream (SIMD) As the name suggests, such kind of systems would have multiple incoming data streams and number of processing units that can act on a single instruction at any given time. They are just like multiprocessor systems having parallel computing architecture. Following is the architecture of SIMD − The best example for SIMD is the graphics cards. These cards have hundreds of individual processing units. If we talk about computational difference between SISD and SIMD then for the adding arrays [5, 15, 20] and [15, 25, 10], SISD architecture would have to perform three different add operations. On the other hand, with the SIMD architecture, we can add then in a single add operation. Advantages of SIMD The advantages of SIMD architecture are as follows − Same operation on multiple elements can be performed using one instruction only. Throughput of the system can be increased by increasing the number of cores of the processor. Processing speed is higher than SISD architecture. Disadvantages of SIMD The disadvantages of SIMD architecture are as follows − There is complex communication between numbers of cores of processor. The cost is higher than SISD architecture. Multiple Instruction Single Data (MISD) stream Systems with MISD stream have number of processing units performing different operations by executing different instructions on the same data set. Following is the architecture of MISD − The representatives of MISD architecture do not yet exist commercially. Multiple Instruction Multiple Data (MIMD) stream In the system using MIMD architecture, each processor in a multiprocessor system can execute different sets of instructions independently on the different set of data set in parallel. It is opposite to SIMD architecture in which single operation is executed on multiple data sets. Following is the architecture of MIMD − A normal multiprocessor uses the MIMD architecture. These architectures are basically used in a number of application areas such as computer-aided design/computer-aided manufacturing, simulation, modeling, communication switches, etc. Memory architectures supporting concurrency While working with the concepts like concurrency and parallelism, there is always a need to speed up the programs. One solution found by computer designers is to create shared-memory multi-computers, i.e., computers having single physical address space, which is accessed by all the cores that a processor is having. In this scenario, there can be a number of different styles of architecture but following are the three important architecture styles − UMA (Uniform Memory Access) In this model, all the processors share the physical memory uniformly. All the processors have equal access time to all the memory words. Each processor may have a private cache memory. The peripheral devices follow a set of rules. When all the processors have equal access to all the peripheral devices, the system is called a symmetric multiprocessor. When only one or a few processors can access the peripheral devices, the system is called an asymmetric multiprocessor. Non-uniform Memory Access (NUMA) In the NUMA multiprocessor model, the access time varies with the location of the memory word. Here, the shared memory is physically distributed among all the processors, called local memories. The collection of all local memories forms a global address space which can be accessed by all the processors. Cache Only Memory Architecture (COMA) The COMA model is a specialized version of the NUMA model. Here, all the distributed main memories are converted to cache memories. Print Page Previous Next Advertisements ”;
Processes Intercommunication
Processes Intercommunication ”; Previous Next Process intercommunication means the exchange of data between processes. It is necessary to exchange the data between processes for the development of parallel application. Following diagram shows the various communication mechanisms for synchronization between multiple sub processes − Various Communication Mechanisms In this section, we will learn about the various communication mechanisms. The mechanisms are described below − Queues Queues can be used with multi-process programs. The Queue class of multiprocessing module is similar to the Queue.Queue class. Hence, the same API can be used. Multiprocessing.Queue provides us a thread and process safe FIFO (first-in first-out) mechanism of communication between processes. Example Following is a simple example taken from python official docs on multiprocessing to understand the concept of Queue class of multiprocessing. from multiprocessing import Process, Queue import queue import random def f(q): q.put([42, None, ”hello”]) def main(): q = Queue() p = Process(target = f, args = (q,)) p.start() print (q.get()) if __name__ == ”__main__”: main() Output [42, None, ”hello”] Pipes It is a data structure, which is used to communicate between processes in multi-process programs. The Pipe() function returns a pair of connection objects connected by a pipe which by default is duplex(two way). It works in the following manner − It returns a pair of connection objects that represent the two ends of pipe. Every object has two methods – send() and recv(), to communicate between processes. Example Following is a simple example taken from python official docs on multiprocessing to understand the concept of Pipe() function of multiprocessing. from multiprocessing import Process, Pipe def f(conn): conn.send([42, None, ”hello”]) conn.close() if __name__ == ”__main__”: parent_conn, child_conn = Pipe() p = Process(target = f, args = (child_conn,)) p.start() print (parent_conn.recv()) p.join() Output [42, None, ”hello”] Manager Manager is a class of multiprocessing module that provides a way to coordinate shared information between all its users. A manager object controls a server process, which manages shared objects and allows other processes to manipulate them. In other words, managers provide a way to create data that can be shared between different processes. Following are the different properties of manager object − The main property of manager is to control a server process, which manages the shared objects. Another important property is to update all the shared objects when any process modifies it. Example Following is an example which uses the manager object for creating a list record in server process and then adding a new record in that list. import multiprocessing def print_records(records): for record in records: print(“Name: {0}nScore: {1}n”.format(record[0], record[1])) def insert_record(record, records): records.append(record) print(“A New record is addedn”) if __name__ == ”__main__”: with multiprocessing.Manager() as manager: records = manager.list([(”Computers”, 1), (”Histoty”, 5), (”Hindi”,9)]) new_record = (”English”, 3) p1 = multiprocessing.Process(target = insert_record, args = (new_record, records)) p2 = multiprocessing.Process(target = print_records, args = (records,)) p1.start() p1.join() p2.start() p2.join() Output A New record is added Name: Computers Score: 1 Name: Histoty Score: 5 Name: Hindi Score: 9 Name: English Score: 3 Concept of Namespaces in Manager Manager Class comes with the concept of namespaces, which is a quick way method for sharing several attributes across multiple processes. Namespaces do not feature any public method, which can be called, but they have writable attributes. Example The following Python script example helps us utilize namespaces for sharing data across main process and child process − import multiprocessing def Mng_NaSp(using_ns): using_ns.x +=5 using_ns.y *= 10 if __name__ == ”__main__”: manager = multiprocessing.Manager() using_ns = manager.Namespace() using_ns.x = 1 using_ns.y = 1 print (”before”, using_ns) p = multiprocessing.Process(target = Mng_NaSp, args = (using_ns,)) p.start() p.join() print (”after”, using_ns) Output before Namespace(x = 1, y = 1) after Namespace(x = 6, y = 10) Ctypes-Array and Value Multiprocessing module provides Array and Value objects for storing the data in a shared memory map. Array is a ctypes array allocated from shared memory and Value is a ctypes object allocated from shared memory. To being with, import Process, Value, Array from multiprocessing. Example Following Python script is an example taken from python docs to utilize Ctypes Array and Value for sharing some data between processes. def f(n, a): n.value = 3.1415927 for i in range(len(a)): a[i] = -a[i] if __name__ == ”__main__”: num = Value(”d”, 0.0) arr = Array(”i”, range(10)) p = Process(target = f, args = (num, arr)) p.start() p.join() print (num.value) print (arr[:]) Output 3.1415927 [0, -1, -2, -3, -4, -5, -6, -7, -8, -9] Communicating Sequential Processes (CSP) CSP is used to illustrate the interaction of systems with other systems featuring concurrent models. CSP is a framework for writing concurrent or program via message passing and hence it is effective for describing concurrency. Python library – PyCSP For implementing core primitives found in CSP, Python has a library called PyCSP. It keeps the implementation very short and readable so that it can be understood very easily. Following is the basic process network of PyCSP − In the above PyCSP process network, there are two processes – Process1 and Process 2. These processes communicate by passing messages through two channels – channel 1 and channel 2. Installing PyCSP With the help of following command, we can install Python library PyCSP − pip install PyCSP Example Following Python script is a simple example for running two processes in parallel to each other. It is done with the help of the PyCSP python libabary − from pycsp.parallel import * import time @process def P1(): time.sleep(1) print(”P1 exiting”) @process def P2(): time.sleep(1) print(”P2 exiting”) def main(): Parallel(P1(), P2()) print(”Terminating”) if __name__ == ”__main__”: main() In the above script, two functions namely P1 and P2 have been created and then decorated with @process for converting them into processes. Output P2 exiting P1 exiting Terminating Print Page Previous Next Advertisements ”;
Debugging Thread Applications ”; Previous Next In this chapter, we will learn how to debug thread applications. We will also learn the importance of debugging. What is Debugging? In computer programming, debugging is the process of finding and removing the bugs, errors and abnormalities from computer program. This process starts as soon as the code is written and continues in successive stages as code is combined with other units of programming to form a software product. Debugging is part of the software testing process and is an integral part of the entire software development life cycle. Python Debugger The Python debugger or the pdb is part of the Python standard library. It is a good fallback tool for tracking down hard-to-find bugs and allows us to fix faulty code quickly and reliably. Followings are the two most important tasks of the pdp debugger − It allows us to check the values of variables at runtime. We can step through the code and set breakpoints also. We can work with pdb in the following two ways − Through the command-line; this is also called postmortem debugging. By interactively running pdb. Working with pdb For working with the Python debugger, we need to use the following code at the location where we want to break into the debugger − import pdb; pdb.set_trace() Consider the following commands to work with pdb through command-line. h(help) d(down) u(up) b(break) cl(clear) l(list)) n(next)) c(continue) s(step) r(return)) b(break) Following is a demo of the h(help) command of the Python debugger − import pdb pdb.set_trace() –Call– >d:programdatalibsite-packagesipythoncoredisplayhook.py(247)__call__() -> def __call__(self, result = None): (Pdb) h Documented commands (type help <topic>): ======================================== EOF c d h list q rv undisplay a cl debug help ll quit s unt alias clear disable ignore longlist r source until args commands display interact n restart step up b condition down j next return tbreak w break cont enable jump p retval u whatis bt continue exit l pp run unalias where Miscellaneous help topics: ========================== exec pdb Example While working with Python debugger, we can set the breakpoint anywhere in the script by using the following lines − import pdb; pdb.set_trace() After setting the breakpoint, we can run the script normally. The script will execute until a certain point; until where a line has been set. Consider the following example where we will run the script by using the above-mentioned lines at various places in the script − import pdb; a = “aaa” pdb.set_trace() b = “bbb” c = “ccc” final = a + b + c print (final) When the above script is run, it will execute the program till a = “aaa”, we can check this in the following output. Output –Return– > <ipython-input-7-8a7d1b5cc854>(3)<module>()->None -> pdb.set_trace() (Pdb) p a ”aaa” (Pdb) p b *** NameError: name ”b” is not defined (Pdb) p c *** NameError: name ”c” is not defined After using the command ‘p(print)’ in pdb, this script is only printing ‘aaa’. This is followed by an error because we have set the breakpoint till a = “aaa”. Similarly, we can run the script by changing the breakpoints and see the difference in the output − import pdb a = “aaa” b = “bbb” c = “ccc” pdb.set_trace() final = a + b + c print (final) Output –Return– > <ipython-input-9-a59ef5caf723>(5)<module>()->None -> pdb.set_trace() (Pdb) p a ”aaa” (Pdb) p b ”bbb” (Pdb) p c ”ccc” (Pdb) p final *** NameError: name ”final” is not defined (Pdb) exit In the following script, we are setting the breakpoint in the last line of the program − import pdb a = “aaa” b = “bbb” c = “ccc” final = a + b + c pdb.set_trace() print (final) The output is as follows − –Return– > <ipython-input-11-8019b029997d>(6)<module>()->None -> pdb.set_trace() (Pdb) p a ”aaa” (Pdb) p b ”bbb” (Pdb) p c ”ccc” (Pdb) p final ”aaabbbccc” (Pdb) Print Page Previous Next Advertisements ”;
Synchronizing Threads
Synchronizing Threads ”; Previous Next Thread synchronization may be defined as a method with the help of which we can be assured that two or more concurrent threads are not simultaneously accessing the program segment known as critical section. On the other hand, as we know that critical section is the part of the program where the shared resource is accessed. Hence we can say that synchronization is the process of making sure that two or more threads do not interface with each other by accessing the resources at the same time. The diagram below shows that four threads trying to access the critical section of a program at the same time. To make it clearer, suppose two or more threads trying to add the object in the list at the same time. This act cannot lead to a successful end because either it will drop one or all the objects or it will completely corrupt the state of the list. Here the role of the synchronization is that only one thread at a time can access the list. Issues in thread synchronization We might encounter issues while implementing concurrent programming or applying synchronizing primitives. In this section, we will discuss two major issues. The issues are − Deadlock Race condition Race condition This is one of the major issues in concurrent programming. Concurrent access to shared resources can lead to race condition. A race condition may be defined as the occurring of a condition when two or more threads can access shared data and then try to change its value at the same time. Due to this, the values of variables may be unpredictable and vary depending on the timings of context switches of the processes. Example Consider this example to understand the concept of race condition − Step 1 − In this step, we need to import threading module − import threading Step 2 − Now, define a global variable, say x, along with its value as 0 − x = 0 Step 3 − Now, we need to define the increment_global() function, which will do the increment by 1 in this global function x − def increment_global(): global x x += 1 Step 4 − In this step, we will define the taskofThread() function, which will call the increment_global() function for a specified number of times; for our example it is 50000 times − def taskofThread(): for _ in range(50000): increment_global() Step 5 − Now, define the main() function in which threads t1 and t2 are created. Both will be started with the help of the start() function and wait until they finish their jobs with the help of join() function. def main(): global x x = 0 t1 = threading.Thread(target= taskofThread) t2 = threading.Thread(target= taskofThread) t1.start() t2.start() t1.join() t2.join() Step 6 − Now, we need to give the range as in for how many iterations we want to call the main() function. Here, we are calling it for 5 times. if __name__ == “__main__”: for i in range(5): main() print(“x = {1} after Iteration {0}”.format(i,x)) In the output shown below, we can see the effect of race condition as the value of x after each iteration is expected 100000. However, there is lots of variation in the value. This is due to the concurrent access of threads to the shared global variable x. Output x = 100000 after Iteration 0 x = 54034 after Iteration 1 x = 80230 after Iteration 2 x = 93602 after Iteration 3 x = 93289 after Iteration 4 Dealing with race condition using locks As we have seen the effect of race condition in the above program, we need a synchronization tool, which can deal with race condition between multiple threads. In Python, the <threading> module provides Lock class to deal with race condition. Further, the Lock class provides different methods with the help of which we can handle race condition between multiple threads. The methods are described below − acquire() method This method is used to acquire, i.e., blocking a lock. A lock can be blocking or non-blocking depending upon the following true or false value − With value set to True − If the acquire() method is invoked with True, which is the default argument, then the thread execution is blocked until the lock is unlocked. With value set to False − If the acquire() method is invoked with False, which is not the default argument, then the thread execution is not blocked until it is set to true, i.e., until it is locked. release() method This method is used to release a lock. Following are a few important tasks related to this method − If a lock is locked, then the release() method would unlock it. Its job is to allow exactly one thread to proceed if more than one threads are blocked and waiting for the lock to become unlocked. It will raise a ThreadError if lock is already unlocked. Now, we can rewrite the above program with the lock class and its methods to avoid the race condition. We need to define the taskofThread() method with lock argument and then need to use the acquire() and release() methods for blocking and non-blocking of locks to avoid race condition. Example Following is example of python program to understand the concept of locks for dealing with race condition − import threading x = 0 def increment_global(): global x x += 1 def taskofThread(lock): for _ in range(50000): lock.acquire() increment_global() lock.release() def main(): global x x = 0 lock = threading.Lock() t1 = threading.Thread(target = taskofThread, args = (lock,)) t2 = threading.Thread(target = taskofThread, args = (lock,)) t1.start() t2.start() t1.join() t2.join() if __name__ == “__main__”: for i in range(5): main() print(“x = {1} after Iteration {0}”.format(i,x)) The following output shows that the effect of race condition is neglected; as the value of x, after each & every iteration, is now 100000, which is as per the expectation of this program. Output x = 100000
Pool of Threads
Concurrency in Python – Pool of Threads ”; Previous Next Suppose we had to create a large number of threads for our multithreaded tasks. It would be computationally most expensive as there can be many performance issues, due to too many threads. A major issue could be in the throughput getting limited. We can solve this problem by creating a pool of threads. A thread pool may be defined as the group of pre-instantiated and idle threads, which stand ready to be given work. Creating thread pool is preferred over instantiating new threads for every task when we need to do large number of tasks. A thread pool can manage concurrent execution of large number of threads as follows − If a thread in a thread pool completes its execution then that thread can be reused. If a thread is terminated, another thread will be created to replace that thread. Python Module – Concurrent.futures Python standard library includes the concurrent.futures module. This module was added in Python 3.2 for providing the developers a high-level interface for launching asynchronous tasks. It is an abstraction layer on the top of Python’s threading and multiprocessing modules for providing the interface for running the tasks using pool of thread or processes. In our subsequent sections, we will learn about the different classes of the concurrent.futures module. Executor Class Executoris an abstract class of the concurrent.futures Python module. It cannot be used directly and we need to use one of the following concrete subclasses − ThreadPoolExecutor ProcessPoolExecutor ThreadPoolExecutor – A Concrete Subclass It is one of the concrete subclasses of the Executor class. The subclass uses multi-threading and we get a pool of thread for submitting the tasks. This pool assigns tasks to the available threads and schedules them to run. How to create a ThreadPoolExecutor? With the help of concurrent.futures module and its concrete subclass Executor, we can easily create a pool of threads. For this, we need to construct a ThreadPoolExecutor with the number of threads we want in the pool. By default, the number is 5. Then we can submit a task to the thread pool. When we submit() a task, we get back a Future. The Future object has a method called done(), which tells if the future has resolved. With this, a value has been set for that particular future object. When a task finishes, the thread pool executor sets the value to the future object. Example from concurrent.futures import ThreadPoolExecutor from time import sleep def task(message): sleep(2) return message def main(): executor = ThreadPoolExecutor(5) future = executor.submit(task, (“Completed”)) print(future.done()) sleep(2) print(future.done()) print(future.result()) if __name__ == ”__main__”: main() Output False True Completed In the above example, a ThreadPoolExecutor has been constructed with 5 threads. Then a task, which will wait for 2 seconds before giving the message, is submitted to the thread pool executor. As seen from the output, the task does not complete until 2 seconds, so the first call to done() will return False. After 2 seconds, the task is done and we get the result of the future by calling the result() method on it. Instantiating ThreadPoolExecutor – Context Manager Another way to instantiate ThreadPoolExecutor is with the help of context manager. It works similar to the method used in the above example. The main advantage of using context manager is that it looks syntactically good. The instantiation can be done with the help of the following code − with ThreadPoolExecutor(max_workers = 5) as executor Example The following example is borrowed from the Python docs. In this example, first of all the concurrent.futures module has to be imported. Then a function named load_url() is created which will load the requested url. The function then creates ThreadPoolExecutor with the 5 threads in the pool. The ThreadPoolExecutor has been utilized as context manager. We can get the result of the future by calling the result() method on it. import concurrent.futures import urllib.request URLS = [”http://www.foxnews.com/”, ”http://www.cnn.com/”, ”http://europe.wsj.com/”, ”http://www.bbc.co.uk/”, ”http://some-made-up-domain.com/”] def load_url(url, timeout): with urllib.request.urlopen(url, timeout = timeout) as conn: return conn.read() with concurrent.futures.ThreadPoolExecutor(max_workers = 5) as executor: future_to_url = {executor.submit(load_url, url, 60): url for url in URLS} for future in concurrent.futures.as_completed(future_to_url): url = future_to_url[future] try: data = future.result() except Exception as exc: print(”%r generated an exception: %s” % (url, exc)) else: print(”%r page is %d bytes” % (url, len(data))) Output Following would be the output of the above Python script − ”http://some-made-up-domain.com/” generated an exception: <urlopen error [Errno 11004] getaddrinfo failed> ”http://www.foxnews.com/” page is 229313 bytes ”http://www.cnn.com/” page is 168933 bytes ”http://www.bbc.co.uk/” page is 283893 bytes ”http://europe.wsj.com/” page is 938109 bytes Use of Executor.map() function The Python map() function is widely used in a number of tasks. One such task is to apply a certain function to every element within iterables. Similarly, we can map all the elements of an iterator to a function and submit these as independent jobs to out ThreadPoolExecutor. Consider the following example of Python script to understand how the function works. Example In this example below, the map function is used to apply the square() function to every value in the values array. from concurrent.futures import ThreadPoolExecutor from concurrent.futures import as_completed values = [2,3,4,5] def square(n): return n * n def main(): with ThreadPoolExecutor(max_workers = 3) as executor: results = executor.map(square, values) for result in results: print(result) if __name__ == ”__main__”: main() Output The above Python script generates the following output − 4 9 16 25 Print Page Previous Next Advertisements ”;