Regular expression of unicode characters on string

Posted by Marcus King on Stack Overflow See other posts from Stack Overflow or by Marcus King
Published on 2010-05-14T14:56:17Z Indexed on 2010/05/14 15:24 UTC
Read the original article Hit count: 314

I'm working in c# doing some OCR work and have extracted the text I need to work with. Now I need to parse a line using Regular Expressions.

string checkNum;
string routingNum;
string accountNum;
Regex regEx = new Regex(@"\u9288\d+\u9288");
Match match = regEx.Match(numbers);
if (match.Success)
    checkNum = match.Value.Remove(0, 1).Remove(match.Value.Length - 1, 1);
regEx = new Regex(@"\u9286\d{9}\u9286");
match = regEx.Match(numbers);
if(match.Success)
    routingNum = match.Value.Remove(0, 1).Remove(match.Value.Length - 1, 1);
regEx = new Regex(@"\d{10}\u9288");
match = regEx.Match(numbers);
if (match.Success)
    accountNum = match.Value.Remove(match.Value.Length - 1, 1);

The problem is that the string contains the necessary unicode characters when I do a .ToCharArray() and inspect the contents of the string, but it never seems to recognize the unicode characters when I parse the string looking for them. I thought strings in C# were unicode by default.

© Stack Overflow or respective owner

Related posts about c#

Related posts about regularexpression