Parsing HTTP - Bytes.length != String.length
Posted
by hotzen
on Stack Overflow
See other posts from Stack Overflow
or by hotzen
Published on 2010-06-10T09:19:49Z
Indexed on
2010/06/10
9:22 UTC
Read the original article
Hit count: 275
Hello,
I consume HTTP via nio.SocketChannel
, so I get chunks of data as Array[Byte]
. I want to put these chunks into a parser and continue parsing after each chunk has been put.
HTTP itself seems to use an ISO8859-Charset but the Payload/Body itself may be arbitrarily encoded: If the HTTP Content-Length specifies X bytes, the UTF8-decoded Body may have much less Characters (1 Character may be represented in UTF8 by 2 bytes, etc).
So what is a good parsing strategy to honor an explicitly specified Content-Length and/or a Transfer-Encoding: Chunked which specifies a chunk-length to be honored.
- append each data-chunk to an
mutable.ArrayBuffer[Byte]
, search for CRLF in the bytes, decode everything from 0 until CRLF to String and match with Regular-Expressions like StatusRegex, HeaderRegex, etc? - decode each data-chunk with the proper charset (e.g. iso8859, utf8, etc) and add to
StringBuilder
. With this solution I am not able to honor any Content-Length or Chunk-Size, but.. do I have to care for it? - any other solution... ?
© Stack Overflow or respective owner