regex numeric data processing: match a series of numbers greater than X

Posted by Mu Mind on Stack Overflow See other posts from Stack Overflow or by Mu Mind
Published on 2010-05-24T23:42:23Z Indexed on 2010/05/25 0:11 UTC
Read the original article Hit count: 145

Filed under:
|

Say I have some data like this:

number_stream = [0,0,0,7,8,0,0,2,5,6,10,11,10,13,5,0,1,0,...]

I want to process it looking for "bumps" that meet a certain pattern.

Imagine I have my own customized regex language for working on numbers, where [[ >=5 ]] represents any number >= 5. I want to capture this case:

([[ >=5 ]]{3,})[[ <3 ]]{2,}

In other words, I want to begin capturing any time I look ahead and see 3 or more values >= 5 in a row, and stop capturing any time I look ahead and see 2+ values < 3. So my output should be:

>>> stream_processor.process(number_stream)
[[5,6,10,11,10,13,5],...]

Note that the first 7,8,... is ignored because it's not long enough, and that the capture ends before the 0,1,0....

I'd also like a stream_processor object I can incrementally pass more data into in subsequent process calls, and return captured chunks as they're completed.

I've written some code to do it, but it was hideous and state-machiney, and I can't help feeling like I'm missing something obvious. Any ideas to do this cleanly?

© Stack Overflow or respective owner

Related posts about python

Related posts about regex