Using C# regex to select text based on custom tags
Posted
by spaceman
on Stack Overflow
See other posts from Stack Overflow
or by spaceman
Published on 2010-04-09T10:37:50Z
Indexed on
2010/04/09
10:43 UTC
Read the original article
Hit count: 642
I have a string in c# containing some data i need to extract based on certain conditions.
The string contains many tenders in the following form :
<TENDER> some words, don't know how many, may contain numbers and things like slashes (/) or whatever <DESCRIPTION> some more words and possibly other things like numbers or whatever describing the tender here </DESCRIPTION> some more words and possibly numbers and weird things </TENDER>
This string doesn't contain any nested <TENDER>
tags, its flat. The <DESCRIPTION>
tags occur only once within the <TENDER>
tags.
I'm using : <TENDER>(.+?)</TENDER>
as the regex to split up the tenders and it works fine. If this is wrong or stupid and you know a better way to write this please let me know as I have discovered I suck at regex.
My problem that I now need to only select a tender if its description contains any word in a list of keywords (lets say for now i want to select a tender only if it contains either "concrete" or"brick" in the description).
So far the regex I have come up with looks like this, but I don't know what to put in the middle. Also I have a vague suspicion that this might return me some false positives.
<TENDER>(.+?)<DESCRIPTION>have no idea what to do here</DESCRIPTION>(.+?)</TENDER>
If any of you regex guru's could point me in the right direction I would be most appreciative.
© Stack Overflow or respective owner