Why is numpy c extension slow?
- by Bitwise
I am working on large numpy arrays, and some native numpy operations are too slow for my needs (for example simple operations such as "bitwise" A&B).
I started looking into writing C extensions to try and improve performance. As a test case, I tried the example given here, implementing a simple trace calculation. I was able to get it to work, but was surprised by the performance: for a (1000,1000) numpy array, numpy.trace() was about 1000 times faster than the C extension!
This happens whether I run it once or many times. Is this expected? Is the C extension overhead that bad? Any ideas how to speed things up?