Thread Worker Pooling in Python


The worker pool pattern is a fairly common tool for writing multi-threaded programs.  You divide your work up into chunks of some size and you submit the to a work queue.  Then there is a pool of threads that watch that queue for tasks to execute, and when complete, they add the jobs into the finished queue.

Here is the file.

Thanks to Global Interpreter Lock, threads are of somewhat limited usefulness in Python.  I foresee myself mostly using this for network limited tasks, like downloaded a large quantity of RSS feeds.  My idea is that tasks put into the system shouldn’t modify global state, so if I actually needed this for computational tasks, it may be feasible to build it on forks instead, or perhaps the 2.6 multiprocessing system.  However, I still use a lot of systems with only python 2.3 installed, so I’m not likely to want to write 2.6 specific code anytime soon.

Many of the thread pool systems I seem have you specify a single function for the pool, then you just enqueue the inputs.  Mine is different in that each item in the queue can be a different function.  I haven’t actually used it this way though, so it is possible that the extra flexibility is generally wasted.

Python’s lambda seem rather limited.  It is limited to a single expression.  I suppose that this is what Lisp and Scheme do as well, but their expressions offer things like progn.  My first idea is that the task to execute would be a function with no arguments.  I was picturing using a lambda to wrap up whatever I wanted to do.

Now, I still offer that via addTask and assume it internally, but I also offer addTaskArgs, and it takes a function reference, and either an argument list (as a list) or a named argument list (as a dict) and wraps it in a lambda to enqueue.

I now find that my knowledge about how to unit test threaded code is rather limited, and the included unit tests are extremely thin.


Leave a Reply