regex numeric data processing: match a series of numbers greater than X
Posted
by Mu Mind
on Stack Overflow
See other posts from Stack Overflow
or by Mu Mind
Published on 2010-05-24T23:42:23Z
Indexed on
2010/05/25
0:11 UTC
Read the original article
Hit count: 141
Say I have some data like this:
number_stream = [0,0,0,7,8,0,0,2,5,6,10,11,10,13,5,0,1,0,...]
I want to process it looking for "bumps" that meet a certain pattern.
Imagine I have my own customized regex language for working on numbers, where [[ >=5 ]] represents any number >= 5. I want to capture this case:
([[ >=5 ]]{3,})[[ <3 ]]{2,}
In other words, I want to begin capturing any time I look ahead and see 3 or more values >= 5 in a row, and stop capturing any time I look ahead and see 2+ values < 3. So my output should be:
>>> stream_processor.process(number_stream)
[[5,6,10,11,10,13,5],...]
Note that the first 7,8,...
is ignored because it's not long enough, and that the capture ends before the 0,1,0...
.
I'd also like a stream_processor
object I can incrementally pass more data into in subsequent process
calls, and return captured chunks as they're completed.
I've written some code to do it, but it was hideous and state-machiney, and I can't help feeling like I'm missing something obvious. Any ideas to do this cleanly?
© Stack Overflow or respective owner