Variable-byte encoding clarification

Posted by Myx on Stack Overflow See other posts from Stack Overflow or by Myx
Published on 2010-03-28T00:06:06Z Indexed on 2010/03/28 0:13 UTC
Read the original article Hit count: 474

Filed under:

encoding

|

multibyte

Hello:

I am very new to the world of byte encoding so please excuse me (and by all means, correct me) if I am using/expressing simple concepts in the wrong way.

I am trying to understand variable-byte encoding. I have read the Wikipedia article (http://en.wikipedia.org/wiki/Variable-width_encoding) as well as a book chapter from an Information Retrieval textbook. I think I understand how to encode a decimal integer. For example, if I wanted to provide variable-byte encoding for the integer 60, I would have the following result:

1 0 1 1 1 1 0 0

(please let me know if the above is incorrect). If I understand the scheme, then I'm not completely sure how the information is compressed. Is it because usually we would use 32 bits to represent an integer, so that representing 60 would result in 1 1 1 1 0 0 preceded by 26 zeros, thus wasting that space as opposed to representing it with just 8 bits instead?

Thank you in advance for the clarifications.

© Stack Overflow or respective owner

Related posts about encoding

<?xml version=“1.0” encoding=“UTF-8”?> not <?xml version='1.0' encoding='UTF-8'?>

as seen on Stack Overflow - Search for 'Stack Overflow'
I am using lxml with tree.write(xmlFileOut, pretty_print = True, xml_declaration = True, encoding='UTF-8' to write out my opened and edited xml file, but I absolutely need to have the xml declaration as <?xml version=“1.0” encoding=“UTF-8”?> and NOT <?xml version='1.0' encoding='UTF-8'… >>> More
Ivar definitions show 'long' type encoding as 'long long' type encoding

as seen on Stack Overflow - Search for 'Stack Overflow'
I've found what I think may be a bug with Ivar and Objective-C runtime. I'm using XCode 3.2.1 and associated libraries, developing a 64 bit app on X86_64 (MacBook Pro). Where I would expect the type encoding for the following "longVal" to be 'l', the Ivar encoding is showing a 'q' (which is a 'long… >>> More
How to avoid encoding the key of request parameters being encoding

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm trying to send a http request using WS.url() with a action receive a custom class parameter like public static void add(@Valid MyPage info) {...} There is a Map in MyPage @Required public Map<String, String> content = new HashMap<String, String>(); But When I try to send a request… >>> More
C# Check if character exists in encoding

as seen on Stack Overflow - Search for 'Stack Overflow'
I am writing a program that a part renders a bitmap font in CP437. In a function that renders the text with I want to be able to check whether a char is available in CP437 before the encoding conversion, like: public static void DrawCharacter(this Graphics g, char c) { if (char_exist_in_encoding(Encoding… >>> More
How to detect the character encoding of a text file?

as seen on Stack Overflow - Search for 'Stack Overflow'
I try to detect which character encoding is used in my file. I try with this code to get the standard encoding public static Encoding GetFileEncoding(string srcFile) { // *** Use Default of Encoding.Default (Ansi CodePage) Encoding enc = Encoding.Default; // *** Detect byte… >>> More

Related posts about multibyte

Parsing multibyte string in PHP

as seen on Stack Overflow - Search for 'Stack Overflow'
I would like to write a (HTML) parser based on state machine but I have doubts how to acctually read/use an input. I decided to load the whole input into one string and then work with it as with an array and hold its index as current parsing position. There would be no problems with single-byte encoding… >>> More
multibyte strtr() -> mb_strtr()

as seen on Stack Overflow - Search for 'Stack Overflow'
Does anyone have written multibyte variant of function strtr() ? I need this one. >>> More
how to mod rewrite unicode byte sequence for the multibyte hyphen character

as seen on Server Fault - Search for 'Server Fault'
We have case where some adobe pdf files format the hyphen character as %E2%80%90. See http://forums.adobe.com/message/2807241 this is caused by the Calibri font I guess. So these pdf files have been released and the links don't work So I thought mod rewrite would come to the rescue. I followed… >>> More
PHP: Split multibyte string (word) into separate characters

as seen on Stack Overflow - Search for 'Stack Overflow'
Trying to split this string "?????" into separate characters (I need an array) using mb_split with no luck... Any suggestions? Thank you! >>> More
How to generate pdf files _with_ utf-8 multibyte characters using Zend Framework

as seen on Stack Overflow - Search for 'Stack Overflow'
Hello, I've got a "little" problem with Zend Framework Zend_Pdf class. Multibyte characters are stripped from generated pdf files. E.g. when I write aabccdee it becomes abcd with lithuanian letters stripped. I'm not sure if it's particularly Zend_Pdf problem or php in general. Source text is encoded… >>> More