Match Anything Except a Sub-pattern

Posted by Tim Lytle on Stack Overflow See other posts from Stack Overflow or by Tim Lytle
Published on 2010-03-11T18:55:58Z Indexed on 2010/03/11 18:59 UTC
Read the original article Hit count: 206

Filed under:
|
|

I'd like to accomplish what this (invalid I believe) regular expression tries to do:

<p><a>([^(<\/a>)]+?)<\/a></p>uniquestring

Essentially match anything except a closing anchor tag. Simple non-greedy doesn't help here because `uniquestring' may very well be after another distant closing anchor tag:

<p><a>text I don't <tag>want</tag> to match</a></p>random 
data<p><a>text I do <tag>want to</tag> match</a></p>uniquestring more
matches <p><a>of <tag>text I do</tag> want to match</a></p>uniquestring 

So I have more tag in between the anchor tags. And I'm using the presence of uniquestring to determine if I want to match the data. So a simple non-greedy ends up matching everything from the start of the data I don't want to the end of the data I do want.

I know I'm edging close to the problems regular expressions (or at least my knowledge of them) aren't good at solving. I could just through the data at an HTML/XML parser, but it is just one simple(ish) search.

Is there some easy way to do this that I'm just missing?

© Stack Overflow or respective owner

Related posts about regex

Related posts about php