sort utility on cyrillic text
- by Anton
I have to sort some lines of cyrillic characters and I want to use the sort utility (on MAC OS X 10.6).
The problem is that result is incorrect.
I take the text into clipboard, then run
pbpaste | sort
This is plaintext data, and I also tried passing a file to the sort command.
My source data is
???????
?????
????
????
??????
???????
????????
?????? ? ????? ???????????????
??????????
????
??????
And after sorting I get
????
????
????
?????
??????
??????
?????? ? ????? ???????????????
???????
???????
????????
??????????
Theese lines aren’t even grouped by first letter.
I tried option -d, but then I get an error
sort: string comparison failed: Illegal byte sequence
sort: Set LC_ALL='C' to work around the problem.
sort: The strings compared were \320\321\321\321' and\320\320\320\321\321\320’.
Exporting the variable as recommended doesn’t solve the problem.
What can I do to use the sort utility for such a task?
Any additional info is necessary?