Using lookahead assertions in regular expressions
Posted
by
Greg Jackson
on Programmers
See other posts from Programmers
or by Greg Jackson
Published on 2011-06-24T09:36:17Z
Indexed on
2011/06/24
16:31 UTC
Read the original article
Hit count: 365
I use regular expressions on a daily basis, as my daily work is 90% in Perl (legacy codebase, but that's a different issue). Despite this, I still find lookahead and lookbehind to be terribly confusing and often unreadable. Right now, if I were to get a code review with a lookahead or lookbehind, I would immediately send it back to see if the problem can be solved by using multiple regular expressions or a different approach. The following are the main reasons I tend not to like them:
- They can be terribly unreadable. Lookahead assertions, for example, start from the beginning of the string no matter where they are placed. That, among other things, can cause some very "interesting" and non-obvious behaviors.
- It used to be the case that many languages didn't support lookahead/lookbehind (or supported them as "experimental features"). This isn't the case quite as much, but there's still always the question as to how well it's supported.
- Quite frankly, they feel like a dirty hack. Regexps often already are, but they can also be quite elegant, and have gained widespread acceptance.
- I've gotten by without any need for them at all... sometimes I think that they're extraneous.
Now, I'll freely admit that especially the last two reasons aren't really good ones, but I felt that I should enumerate what goes through my mind when I see one. I'm more than willing to change my mind about them, but I feel that they violate some of my core tenets of programming, including:
- Code should be as readable as possible without sacrificing functionality -- this may include doing something in a less efficient, but clearer was as long as the difference is negligible or unimportant to the application as a whole.
- Code should be maintainable -- if another programmer comes along to fix my code, non-obvious behavior can hide bugs or make functional code appear buggy (see readability)
- "The right tool for the right job" -- I'm sure you can come up with contrived examples that could use lookahead, but I've never come across something that really needs them in my real-world development work. Is there anything that they're really the best tool for, as opposed to, say, multiple regexps (or, alternatively, are they the best tool for most cases they're used for today).
My question is this: Is it good practice to use lookahead/lookbehind in regular expressions, or are they simply a hack that have found their way into modern production code?
I'd be perfectly happy to be convinced that I'm wrong about this, and simple examples are useful for examples or illustration, but by themselves, won't be enough to convince me.
© Programmers or respective owner