Regex expression is too greedy

Posted by alastairs on Stack Overflow See other posts from Stack Overflow or by alastairs
Published on 2010-06-13T23:55:24Z Indexed on 2010/06/14 0:02 UTC
Read the original article Hit count: 251

Filed under:
|
|

I'm writing a regular expression to match data from the IMDb soundtracks data file. My regexes are mostly working, although they are in places slurping too much text into my named groups. Take the following regex for example:

"^  Performed by '?(?<performer>.*)('? \(qv\))?$"

The performer group includes the string ' (qv) as well as the performer's name. Unfortunately, because the records are not consistently formatted, some performers' names are surrounded by single quotation marks whilst others are not. This means they are optional as far as the regex is concerned.

I've tried marking the last group as a greedy group using the ?> group specifier, but this appeared to have no effect on the results.

I can improve the results by changing the performer group to match a small range of characters, but this reduces my chances of parsing the name out correctly. Furthermore, if I were to just exclude the apostrophe character, I would then be unable to parse, e.g., band names containing apostrophes, such as Elia's Lonely Friends Band who performed Run For Your Life featured in Resident Evil: Apocalypse.

© Stack Overflow or respective owner

Related posts about c#

Related posts about .NET