Regex to extract portions of file name
- by jakesankey
I have text files formatted as such:
R156484COMP_004A7001_20100104_065119.txt
I need to consistently extract the R****COMP, the 004A7001 number, 20100104 (date), and don't care about the 065119 number. the problem is that not ALL of the files being parsed have the exact naming convention. some may be like this:
R168166CRIT_156B2075_SU2_20091223_123456.txt
or
R285476COMP_SU1_125A6025_20100407_123456.txt
So how could I use regex instead of split to ensure I am always getting that serial (ex. 004A7001), the date (ex. 20100104), and the R****COMP (or CRIT)???
Here is what I do now but it only gets the files formatted like my first example.
if (file.Count(c => c == '_') != 3) continue;
and further down in the code I have:
string RNumber = Path.GetFileNameWithoutExtension(file);
string RNumberE = RNumber.Split('_')[0];
string RNumberD = RNumber.Split('_')[1];
string RNumberDate = RNumber.Split('_')[2];
DateTime dateTime = DateTime.ParseExact(RNumberDate, "yyyyMMdd", Thread.CurrentThread.CurrentCulture);
string cmmDate = dateTime.ToString("dd-MMM-yyyy");
UPDATE: This is now where I am at -- I get an error to parse RNumberDate to an actual date format. "Cannot implicitly convert type 'RegularExpressions.Match' to 'string'
string RNumber = Path.GetFileNameWithoutExtension(file);
Match RNumberE = Regex.Match(RNumber, @"^(R|L)\d{6}(COMP|CRIT|TEST|SU[1-9])(?=_)", RegexOptions.IgnoreCase);
Match RNumberD = Regex.Match(RNumber, @"(?<=_)\d{3}[A-Z]\d{4}(?=_)", RegexOptions.IgnoreCase);
Match RNumberDate = Regex.Match(RNumber, @"(?<=_)\d{8}(?=_)", RegexOptions.IgnoreCase);
DateTime dateTime = DateTime.ParseExact(RNumberDate, "yyyyMMdd", Thread.CurrentThread.CurrentCulture);
string cmmDate = dateTime.ToString("dd-MMM-yyyy")