regex numeric data processing: match a series of numbers greater than X
- by Mu Mind
Say I have some data like this:
number_stream = [0,0,0,7,8,0,0,2,5,6,10,11,10,13,5,0,1,0,...]
I want to process it looking for "bumps" that meet a certain pattern.
Imagine I have my own customized regex language for working on numbers, where [[ =5 ]] represents any number = 5. I want to capture this case:
([[ >=5 ]]{3,})[[ <3 ]]{2,}
In other words, I want to begin capturing any time I look ahead and see 3 or more values = 5 in a row, and stop capturing any time I look ahead and see 2+ values < 3. So my output should be:
>>> stream_processor.process(number_stream)
[[5,6,10,11,10,13,5],...]
Note that the first 7,8,... is ignored because it's not long enough, and that the capture ends before the 0,1,0....
I'd also like a stream_processor object I can incrementally pass more data into in subsequent process calls, and return captured chunks as they're completed.
I've written some code to do it, but it was hideous and state-machiney, and I can't help feeling like I'm missing something obvious. Any ideas to do this cleanly?