Is it safe to use random Unicode for complex delimiter sequences in strings?
Posted
by ccomet
on Stack Overflow
See other posts from Stack Overflow
or by ccomet
Published on 2010-04-19T16:13:07Z
Indexed on
2010/04/19
16:43 UTC
Read the original article
Hit count: 308
Question: In terms of program stability and ensuring that the system will actually operate, how safe is it to use chars like ¦
, §
or ‡
for complex delimiter sequences in strings? Can I reliable believe that I won't run into any issues in a program reading these incorrectly?
I am working in a system, using C# code, in which I have to store a fairly complex set of information within a single string. The readability of this string is only necessary on the computer side, end-users should only ever see the information after it has been parsed by the appropriate methods. Because some of the data in these strings will be collections of variable size, I use different delimiters to identify what parts of the string correspond to a certain tier of organization. There are enough cases that the standard sets of ;, |, and similar ilk have been exhausted. I considered two-char delimiters, like ;# or ;|, but I felt that it would be very inefficient. There probably isn't that large of a performance difference in storing with one char versus two chars, but when I have the option of picking the smaller option, it just feels wrong to pick the larger one.
So finally, I considered using the set of characters like the double dagger and section. They only take up one char, and they are definitely not going to show up in the actual text that I'll be storing, so they won't be confused for anything.
But character encoding is finicky. While the visibility to the end user is meaningless (since they, in fact, won't see it), I became recently concerned about how the programs in the system will read it. The string is stored in one database, while a separate program is responsible for both encoding and decoding the string into different object types for the rest of the application to work with. And if something is expected to be written one way, is possibly written another, then maybe the whole system will fail and I can't really let that happen. So is it safe to use these kind of chars for background delimiters?
© Stack Overflow or respective owner