PDF parsing file trailer
Posted
by Ralph
on Stack Overflow
See other posts from Stack Overflow
or by Ralph
Published on 2010-05-24T16:55:59Z
Indexed on
2010/05/24
17:01 UTC
Read the original article
Hit count: 480
It is not clear from the PDF ISO standard document (PDF32000-2008) whether a comment may follow the startxref
keyword:
startxref
Byte_offset_of_last_cross-reference_section
%%EOF
The standard does seem to imply that comments may appear anywhere:
7.2.3 Comments
Any occurrence of the PERCENT SIGN (25h) outside a string or stream introduces a comment. The comment consists of all characters after the PERCENT SIGN and up to but not including the end of the line, including regular, delimiter, SPACE (20h), and HORZONTAL TAB characters (09h). A conforming reader shall ignore comments, and treat them as single white-space characters. That is, a comment separates the token preceding it from the one following it.
EXAMPLE The PDF fragment in this example is syntactically equivalent to just the tokens abc and 123.
abc% comment ( /%) blah blah blah
123
Comments (other than the %PDF–n.m and %%EOF comments described in 7.5, "File Structure") have no semantics. They are not necessarily preserved by applications that edit PDF files.
If they are allowed to appear after the startxref
, parsing the file becomes more difficult because you do not know how far to back up from the %%EOF
comment to start parsing to find the byte offset.
Any ideas?
© Stack Overflow or respective owner