Using Pool with Queue in the Python multiprocessing module

Friday, July 11, 2014

I want to use the multiprocessing module to speed up traversing a directory structure. First I did some research and found this Stack Overflow thread:

how do I run os.walk in parallel in python?

However, when I tried to adapt the code in the thread, I kept on running into a problem. Here is a little script I wrote to just test out Pool and figure out how it works.:

import os

from multiprocessing.pool import Pool
from multiprocessing import Process
from multiprocessing import JoinableQueue as Queue

def scan():
print "Hi!"
while True:
print "Inside loop"
directory = unsearched.get()
print "Got directory"
print "{0}".format(directory)

if __name__ == '__main__':

# Put those directories on the queue
unsearched = Queue()
top_dirs = ['a', 'b', 'c']
for d in top_dirs:
print unsearched

# Scan the directories
processes = 1
pool = Pool(processes)
for i in range(processes):
print "Process {0}".format(i)

# Block until all the tasks are done
print 'Done'

What is happening is that the script goes inside of the loop inside of the scan function and just sits there:

PS C:\Test> python .\
<multiprocessing.queues.JoinableQueue object at 0x000000000272F630>
Process 0
Inside loop

I'm sure I'm missing something simple here.