How to pick a chunksize for python multiprocessing with large datasets

Posted by Sandro on Stack Overflow See other posts from Stack Overflow or by Sandro
Published on 2010-04-24T20:09:12Z Indexed on 2010/04/24 20:13 UTC
Read the original article Hit count: 267

I am attempting to to use python to gain some performance on a task that can be highly parallelized using http://docs.python.org/library/multiprocessing.

When looking at their library they say to use chunk size for very long iterables. Now, my iterable is not long, one of the dicts that it contains is huge: ~100000 entries, with tuples as keys and numpy arrays for values.

How would I set the chunksize to handle this and how can I transfer this data quickly?

Thank you.

© Stack Overflow or respective owner

Related posts about python

Related posts about multiprocessing