Multiprocessing in Python
Home

Table of Contents

Python documentation
multiprocessing module

1. Introduction

The multiprocessing module (MP) spawns sub-processes insteads of threads. Can be used for data parallelism – parallelizing the execution of a function accross multiple input values by distributing the data accross processes.

2. Start methods

Can be set as follows.

import multiprocessing as mp

def foo(q):
    pass

if __name__ == '__main__':
    # using get_context() is preferred over set_start_method().
    ctx = mp.get_context('spawn')  # mp has custom Context object
                                   # types.
    q = ctx.Queue()
    p = ctx.Process(target=foo, args=(q,))

set_start_method() should not be used more than once in the program.

2.1. spawn

Parent process starts a fresh python interpreter each time. This is the slowest option. Inherits only those objects that are needed by the multiprocessing.Process.run method.

2.2. fork

Parent process forks the python interpreter using os.fork(). All resources of the parent are inherited by the child process. This is faster then spawn.

"Not safe if used on a multi-threading process" (I think this means that this method should not be used if we are also using the threading module?). Overall the fork start method can lead to crashes of the subprocess so it is unsafe.

This is available on Unix only.

2.3. forkserver

When the program starts, using this method, a server process is also started. The fork server can fork a new process every time it is requested. The fork process server is single-threaded and so it is safely uses os.fork(). No unnecessary resources are inherited from the parent process.

Similar to fork, this is available on Unix only.

2.4. Verdict

I will be using forkserver for my programs.

3. Communication between processes

MP allows exchange of objects between processes.

3.1. Queue

Use mp.Queue instead of queue.Queue.

Steps:

  1. Create a queue using q = mp.Queue().
  2. q.put([1, 'a', 2]) puts the list [1, 'a', 2] into the queue.
  3. q.get() retrieves whatever was put in the queue (I think it retrieves in FIFO order?). This can be using in conjunction with another function as in len(q.get()) to get the length of our list.
import multiprocessing as mp

q = mp.Queue()
print(q)
q.put([1, 'a', 2])
print(len(q.get()))

3.1.1. Putting many queues inside(?) a single process

  1. Start a process using p = mp.Process(target=putter, arg=q), where putter puts objects into the queue q.
  2. Start the process: p.start().
  3. Intermediate action like requesting things from the queue.
  4. Join after process termination: p.join().
def putter(q: mp.queues.Queue, value):
    q.put(value)

if __name__ == '__main__':
    q: mp.queues.Queue = mp.Queue()

    p: mp.Process = mp.Process(target=putter, args=(q, [42, None, 'hello']))

    p.start()
    print(q.get())    # prints "[42, None, 'hello']"
    # print(q.get)    # Running it again blocks the process for some reason.
    p.join()

3.2. Pipe

The Pipe() function returns a two-way connection by returning a pair of connection objects.

Each connection object has a send() and recv() method.

Author: Vaibhav Karve

Created: 2024-02-04 Sun 21:01

Validate