Regular expression of unicode characters on string
Posted
by Marcus King
on Stack Overflow
See other posts from Stack Overflow
or by Marcus King
Published on 2010-05-14T14:56:17Z
Indexed on
2010/05/14
15:24 UTC
Read the original article
Hit count: 309
I'm working in c# doing some OCR work and have extracted the text I need to work with. Now I need to parse a line using Regular Expressions.
string checkNum;
string routingNum;
string accountNum;
Regex regEx = new Regex(@"\u9288\d+\u9288");
Match match = regEx.Match(numbers);
if (match.Success)
checkNum = match.Value.Remove(0, 1).Remove(match.Value.Length - 1, 1);
regEx = new Regex(@"\u9286\d{9}\u9286");
match = regEx.Match(numbers);
if(match.Success)
routingNum = match.Value.Remove(0, 1).Remove(match.Value.Length - 1, 1);
regEx = new Regex(@"\d{10}\u9288");
match = regEx.Match(numbers);
if (match.Success)
accountNum = match.Value.Remove(match.Value.Length - 1, 1);
The problem is that the string contains the necessary unicode characters when I do a .ToCharArray() and inspect the contents of the string, but it never seems to recognize the unicode characters when I parse the string looking for them. I thought strings in C# were unicode by default.
© Stack Overflow or respective owner