Match Anything Except a Sub-pattern
Posted
by Tim Lytle
on Stack Overflow
See other posts from Stack Overflow
or by Tim Lytle
Published on 2010-03-11T18:55:58Z
Indexed on
2010/03/11
18:59 UTC
Read the original article
Hit count: 206
I'd like to accomplish what this (invalid I believe) regular expression tries to do:
<p><a>([^(<\/a>)]+?)<\/a></p>uniquestring
Essentially match anything except a closing anchor tag. Simple non-greedy doesn't help here because `uniquestring' may very well be after another distant closing anchor tag:
<p><a>text I don't <tag>want</tag> to match</a></p>random
data<p><a>text I do <tag>want to</tag> match</a></p>uniquestring more
matches <p><a>of <tag>text I do</tag> want to match</a></p>uniquestring
So I have more tag in between the anchor tags. And I'm using the presence of uniquestring
to determine if I want to match the data. So a simple non-greedy ends up matching everything from the start of the data I don't want to the end of the data I do want.
I know I'm edging close to the problems regular expressions (or at least my knowledge of them) aren't good at solving. I could just through the data at an HTML/XML parser, but it is just one simple(ish) search.
Is there some easy way to do this that I'm just missing?
© Stack Overflow or respective owner