Using C# regex to select text based on custom tags

Posted by spaceman on Stack Overflow See other posts from Stack Overflow or by spaceman
Published on 2010-04-09T10:37:50Z Indexed on 2010/04/09 10:43 UTC
Read the original article Hit count: 642

Filed under:
|

I have a string in c# containing some data i need to extract based on certain conditions.

The string contains many tenders in the following form :

<TENDER> some words, don't know how many, may contain numbers and things like slashes (/) or whatever <DESCRIPTION> some more words and possibly other things like numbers or whatever describing the tender here </DESCRIPTION> some more words and possibly numbers and weird things </TENDER>

This string doesn't contain any nested <TENDER> tags, its flat. The <DESCRIPTION> tags occur only once within the <TENDER> tags.

I'm using : <TENDER>(.+?)</TENDER> as the regex to split up the tenders and it works fine. If this is wrong or stupid and you know a better way to write this please let me know as I have discovered I suck at regex.

My problem that I now need to only select a tender if its description contains any word in a list of keywords (lets say for now i want to select a tender only if it contains either "concrete" or"brick" in the description).

So far the regex I have come up with looks like this, but I don't know what to put in the middle. Also I have a vague suspicion that this might return me some false positives.

<TENDER>(.+?)<DESCRIPTION>have no idea what to do here</DESCRIPTION>(.+?)</TENDER>

If any of you regex guru's could point me in the right direction I would be most appreciative.

© Stack Overflow or respective owner

Related posts about c#

Related posts about regex