Using Pool with Queue in the Python multiprocessing module

Friday, July 11, 2014

I want to use the multiprocessing module to speed up traversing a directory structure. First I did some research and found this Stack Overflow thread:


how do I run os.walk in parallel in python?


However, when I tried to adapt the code in the thread, I kept on running into a problem. Here is a little script I wrote to just test out Pool and figure out how it works.:



import os

from multiprocessing.pool import Pool
from multiprocessing import Process
from multiprocessing import JoinableQueue as Queue

def scan():
print "Hi!"
while True:
print "Inside loop"
directory = unsearched.get()
print "Got directory"
unsearched.task_done()
print "{0}".format(directory)

if __name__ == '__main__':

# Put those directories on the queue
unsearched = Queue()
top_dirs = ['a', 'b', 'c']
for d in top_dirs:
unsearched.put(d)
print unsearched

# Scan the directories
processes = 1
pool = Pool(processes)
for i in range(processes):
print "Process {0}".format(i)
pool.apply_async(scan)

# Block until all the tasks are done
unsearched.join()
print 'Done'


What is happening is that the script goes inside of the loop inside of the scan function and just sits there:



PS C:\Test> python .\multiprocessing_test.py
<multiprocessing.queues.JoinableQueue object at 0x000000000272F630>
Process 0
Hi!
Inside loop


I'm sure I'm missing something simple here.







http://ift.tt/VUr6Lk