Wrong values reported by pyPDF for various box regions
- by romor
Using pyPdf, for most files I get matched results concerning various box's dimensions compared to what Acrobat reports. However for some files I get different values reported by pyPdf and Acrobat, like:
pyPdf:
artBox: 595.3 x 841.9
bleedBox: 595.3 x 841.9
cropBox: 595.3 x 841.9
trimBox: 517.3 x 754
Acrobat:
artBox: 439.35 x 666.13 pt
bleedBox: 439.35 x 666.13 pt
cropBox: 439.35 x 666.13 pt
trimBox: 439.35 x 666.13 pt
I thought it's units issue, but then ratio between widths and heights doesn't match also, not mentioning trimBox mismatch
Correct results are those reported by Acrobat of course. Does someone know why is this and is there a way I get correct dimensions by using pyPdf?
Thanks
couple of minutes later...
After reading this question: Are PDF box coordinates relative or absolute? I figured I didn't considered uper left corner to be different then 0 (zero). It turned out that box starts at 77.95 x 87.87, so if we reduce reported values of trimBox by this values correct result is obtained.
artBox: 0 x 0
bleedBox: 0 x 0
cropBox: 0 x 0
trimBox: 77.95 x 87.87
Other boxes seem with misleading values or I misinterpret them.
Snippet:
from pyPdf import PdfFileReader
pdfread = PdfFileReader(file('my.pdf', 'rb'))
page = 1
width = pdfread.getPage(page).trimBox[2]-pdfread.getPage(page).trimBox[0]
height = pdfread.getPage(page).trimBox[3] - pdfread.getPage(page).trimBox[1]
print width, height