I need to parse strings inputs where
the columns are separated by columns and any
field that contains a comma in
the data is wrapped in quotes (commas separated, quoted text identifiers). For this project I need to remove
the quotes and any commas that occur between pairs of quotes. Basically, I need to remove commas and quotes that are contained in fields while preserving
the commas that are used to separate
the fields. Here's a little code I put together that handles
the simple scenario:
// Sample input 1: This works and covers 99% of
the records that I need to parse.
string str1 = "
[email protected],2010/03/27 12:2:02,,some_first_name,some_last_name,,\"This Address Works, Suite 200\",Some City,TN,09876-5432,9795551212x123,XYZ";
str1 = Regex.Replace(str1, "\"([^\"^,]*),([^\"^,]*)\"", "$1$2");
Console.WriteLine(str1);
// Outputs:
[email protected],2010/03/27 12:2:02,,some_first_name,some_last_name,,This Address Works Suite 200,Some City,TN,09876-5432,9795551212x123,XYZ
Although this code works for most of my records, it doesn't work when a
field contains more than one commas. What I would like to do is modify
the code so that it remove each instance of a comma contained within
the column no matter how many commas there are in
the field. I don't want to hard code only handling 2 commas, or 3 commas, or 25 commas.
The code should just remove all
the commas in
the field. Below is an example of what my code doesn't handle properly.
// Sample input 2: This doesn't work since there is more than 1 comma between
the quotes.
string str2 = "
[email protected],2010/03/27 12:2:02,,some_first_name,some_last_name,,\"i,l,k,e, c,o,m,m,a,s, i,n ,m,y, f,i,e,l,d\",Some City,TN,09876-5432,9795551212x123,XYZ";
str2 = Regex.Replace(str2, "\"([^\"^,]*),([^\"^,]*)\"", "$1$2");
Console.WriteLine(str2);
// Desired output:
[email protected],2010/03/27 12:2:02,,some_first_name,some_last_name,,i like commas in my
field,Some City,TN,09876-5432,9795551212x123,XYZ
Any help would be appreciated for this Regular Expression newbie.