Wrong values reported by pyPDF for various box regions

Posted by romor on Stack Overflow See other posts from Stack Overflow or by romor
Published on 2010-12-24T10:14:29Z Indexed on 2010/12/24 10:54 UTC
Read the original article Hit count: 218

Filed under:
|

Using pyPdf, for most files I get matched results concerning various box's dimensions compared to what Acrobat reports. However for some files I get different values reported by pyPdf and Acrobat, like:

pyPdf:

artBox:   595.3 x 841.9
bleedBox: 595.3 x 841.9
cropBox:  595.3 x 841.9
trimBox:  517.3 x 754

Acrobat:

artBox:   439.35 x 666.13 pt
bleedBox: 439.35 x 666.13 pt
cropBox:  439.35 x 666.13 pt
trimBox:  439.35 x 666.13 pt

I thought it's units issue, but then ratio between widths and heights doesn't match also, not mentioning trimBox mismatch

Correct results are those reported by Acrobat of course. Does someone know why is this and is there a way I get correct dimensions by using pyPdf?

Thanks


couple of minutes later...

After reading this question: Are PDF box coordinates relative or absolute? I figured I didn't considered uper left corner to be different then 0 (zero). It turned out that box starts at 77.95 x 87.87, so if we reduce reported values of trimBox by this values correct result is obtained.

artBox:   0 x 0
bleedBox: 0 x 0
cropBox:  0 x 0
trimBox:  77.95 x 87.87

Other boxes seem with misleading values or I misinterpret them.

Snippet:

from pyPdf import PdfFileReader

pdfread = PdfFileReader(file('my.pdf', 'rb'))
page = 1

width = pdfread.getPage(page).trimBox[2]-pdfread.getPage(page).trimBox[0]
height = pdfread.getPage(page).trimBox[3] - pdfread.getPage(page).trimBox[1]

print width, height

© Stack Overflow or respective owner

Related posts about python

Related posts about pypdf