How to pick a chunksize for python multiprocessing with large datasets
Posted
by Sandro
on Stack Overflow
See other posts from Stack Overflow
or by Sandro
Published on 2010-04-24T20:09:12Z
Indexed on
2010/04/24
20:13 UTC
Read the original article
Hit count: 267
I am attempting to to use python to gain some performance on a task that can be highly parallelized using http://docs.python.org/library/multiprocessing.
When looking at their library they say to use chunk size for very long iterables. Now, my iterable is not long, one of the dicts that it contains is huge: ~100000 entries, with tuples as keys and numpy arrays for values.
How would I set the chunksize to handle this and how can I transfer this data quickly?
Thank you.
© Stack Overflow or respective owner