Multiprocessing in Python
Home
Table of Contents
- Python documentation
- multiprocessing module
1. Introduction
The multiprocessing
module (MP) spawns sub-processes insteads of threads. Can be
used for data parallelism – parallelizing the execution of a function
accross multiple input values by distributing the data accross
processes.
2. Start methods
Can be set as follows.
import multiprocessing as mp def foo(q): pass if __name__ == '__main__': # using get_context() is preferred over set_start_method(). ctx = mp.get_context('spawn') # mp has custom Context object # types. q = ctx.Queue() p = ctx.Process(target=foo, args=(q,))
set_start_method()
should not be used more than once in the program.
2.1. spawn
Parent process starts a fresh python interpreter each time. This is
the slowest option. Inherits only those objects that are needed by
the multiprocessing.Process.run
method.
2.2. fork
Parent process forks the python interpreter using os.fork()
. All
resources of the parent are inherited by the child process. This is
faster then spawn
.
"Not safe if used on a multi-threading process" (I think this means
that this method should not be used if we are also using the
threading
module?). Overall the fork start method can lead to
crashes of the subprocess so it is unsafe.
This is available on Unix only.
2.3. forkserver
When the program starts, using this method, a server process is
also started. The fork server can fork a new process every time it
is requested. The fork process server is single-threaded and so it
is safely uses os.fork()
. No unnecessary resources are inherited
from the parent process.
Similar to fork
, this is available on Unix only.
2.4. Verdict
I will be using forkserver
for my programs.
3. Communication between processes
MP allows exchange of objects between processes.
3.1. Queue
Use mp.Queue
instead of queue.Queue
.
Steps:
- Create a queue using
q = mp.Queue()
. q.put([1, 'a', 2])
puts the list[1, 'a', 2]
into the queue.q.get()
retrieves whatever was put in the queue (I think it retrieves in FIFO order?). This can be using in conjunction with another function as inlen(q.get())
to get the length of our list.
import multiprocessing as mp q = mp.Queue() print(q) q.put([1, 'a', 2]) print(len(q.get()))
3.1.1. Putting many queues inside(?) a single process
- Start a process using
p = mp.Process(target=putter, arg=q)
, whereputter
puts objects into the queueq
. - Start the process:
p.start()
. - Intermediate action like requesting things from the queue.
- Join after process termination:
p.join()
.
def putter(q: mp.queues.Queue, value): q.put(value) if __name__ == '__main__': q: mp.queues.Queue = mp.Queue() p: mp.Process = mp.Process(target=putter, args=(q, [42, None, 'hello'])) p.start() print(q.get()) # prints "[42, None, 'hello']" # print(q.get) # Running it again blocks the process for some reason. p.join()
3.2. Pipe
The Pipe()
function returns a two-way connection by returning a pair
of connection objects.
Each connection object has a send()
and recv()
method.