Chunking a List - .NET vs Python
- by Abhijeet Patel
Chunking a List
As I mentioned last time, I'm knee deep in python these days. I come from a statically typed background so it's definitely
a mental adjustment. List comprehensions is BIG in Python and having worked with a few of them I can see why.
Let's say we need to chunk a list into sublists of a specified size.
Here is how we'd do it in C#
static class Extensions { public static IEnumerable<List<T>> Chunk<T>(this List<T> l, int chunkSize) { if (chunkSize <0) { throw new ArgumentException("chunkSize cannot be negative", "chunkSize"); } for (int i = 0; i < l.Count; i += chunkSize) { yield return new List<T>(l.Skip(i).Take(chunkSize)); } } } static void Main(string[] args) { var l = new List<string> { "a", "b", "c", "d", "e", "f","g" }; foreach (var list in l.Chunk(7)) { string str = list.Aggregate((s1, s2) => s1 + "," + s2); Console.WriteLine(str); } }
A little wordy but still pretty concise thanks to LINQ.We skip the iteration number plus chunkSize elements and yield out a new List
of chunkSize elements on each iteration.
The python implementation is a bit more terse.
def chunkIterable(iter, chunkSize): '''Chunks an iterable object into a list of the specified chunkSize ''' assert hasattr(iter, "__iter__"), "iter is not an iterable" for i in xrange(0, len(iter), chunkSize): yield iter[i:i + chunkSize] if __name__ == '__main__': l = ['a', 'b', 'c', 'd', 'e', 'f'] generator = chunkIterable(l,2) try: while(1): print generator.next() except StopIteration: pass
xrange generates elements in the specified range taking in a seed and returning a generator.
which can be used in a for loop(much like using a C# iterator in a foreach loop)
Since chunkIterable has a yield statement, it turns this method into a generator as well.
iter[i:i + chunkSize] essentially slices the list based on the current iteration index and chunksize and creates
a new list that we yield out to the caller one at a time.
A generator much like an iterator is a state machine and each subsequent call
to it remembers the state at which the last call left off and resumes execution from that point.
The caveat to keep in mind is that since variables are not explicitly typed we need to ensure that the object passed
in is iterable using hasattr(iter, "__iter__").This way we can perform chunking on any object which is an
"iterable", very similar to accepting an IEnumerable in the .NET land