Parsing HTTP - Bytes.length != String.length

Posted by hotzen on Stack Overflow See other posts from Stack Overflow or by hotzen
Published on 2010-06-10T09:19:49Z Indexed on 2010/06/10 9:22 UTC
Read the original article Hit count: 273

Filed under:
|
|
|

Hello,

I consume HTTP via nio.SocketChannel, so I get chunks of data as Array[Byte]. I want to put these chunks into a parser and continue parsing after each chunk has been put.

HTTP itself seems to use an ISO8859-Charset but the Payload/Body itself may be arbitrarily encoded: If the HTTP Content-Length specifies X bytes, the UTF8-decoded Body may have much less Characters (1 Character may be represented in UTF8 by 2 bytes, etc).

So what is a good parsing strategy to honor an explicitly specified Content-Length and/or a Transfer-Encoding: Chunked which specifies a chunk-length to be honored.

  • append each data-chunk to an mutable.ArrayBuffer[Byte], search for CRLF in the bytes, decode everything from 0 until CRLF to String and match with Regular-Expressions like StatusRegex, HeaderRegex, etc?
  • decode each data-chunk with the proper charset (e.g. iso8859, utf8, etc) and add to StringBuilder. With this solution I am not able to honor any Content-Length or Chunk-Size, but.. do I have to care for it?
  • any other solution... ?

© Stack Overflow or respective owner

Related posts about http

Related posts about parsing