sort utility on cyrillic text
Posted
by Anton
on Super User
See other posts from Super User
or by Anton
Published on 2010-03-16T14:17:48Z
Indexed on
2010/03/16
14:26 UTC
Read the original article
Hit count: 251
I have to sort some lines of cyrillic characters and I want to use the sort utility (on MAC OS X 10.6). The problem is that result is incorrect. I take the text into clipboard, then run pbpaste | sort This is plaintext data, and I also tried passing a file to the sort command.
My source data is
???????
?????
????
????
??????
???????
????????
?????? ? ????? ???????????????
??????????
????
??????
And after sorting I get
????
????
????
?????
??????
??????
?????? ? ????? ???????????????
???????
???????
????????
??????????
Theese lines aren’t even grouped by first letter. I tried option -d, but then I get an error
sort: string comparison failed: Illegal byte sequence
sort: Set LC_ALL='C' to work around the problem.
sort: The strings compared were \320\321\321\321' and
\320\320\320\321\321\320’.
Exporting the variable as recommended doesn’t solve the problem. What can I do to use the sort utility for such a task? Any additional info is necessary?
© Super User or respective owner