Using C# regex to select text based on custom tags
- by spaceman
I have a string in c# containing some data i need to extract based on certain conditions.
The string contains many tenders in the following form :
<TENDER> some words, don't know how many, may contain numbers and things like slashes (/) or whatever <DESCRIPTION> some more words and possibly other things like numbers or whatever describing the tender here </DESCRIPTION> some more words and possibly numbers and weird things </TENDER>
This string doesn't contain any nested <TENDER> tags, its flat. The <DESCRIPTION> tags occur only once within the <TENDER> tags.
I'm using : <TENDER>(.+?)</TENDER> as the regex to split up the tenders and it works fine. If this is wrong or stupid and you know a better way to write this please let me know as I have discovered I suck at regex.
My problem that I now need to only select a tender if its description contains any word in a list of keywords (lets say for now i want to select a tender only if it contains either "concrete" or"brick" in the description).
So far the regex I have come up with looks like this, but I don't know what to put in the middle. Also I have a vague suspicion that this might return me some false positives.
<TENDER>(.+?)<DESCRIPTION>have no idea what to do here</DESCRIPTION>(.+?)</TENDER>
If any of you regex guru's could point me in the right direction I would be most appreciative.