pyPDF - Retrieve page numbers from document

Posted by SquidneyPoitier on Stack Overflow See other posts from Stack Overflow or by SquidneyPoitier
Published on 2012-09-10T23:59:52Z Indexed on 2012/09/11 15:38 UTC
Read the original article Hit count: 303

Filed under:

python

|

pypdf

At the moment I'm looking into doing some PDF merging with pyPdf, but sometimes the inputs are not in the right order, so I'm looking into scraping each page for its page number to determine the order it should go in (e.g. if someone split up a book into 20 10-page PDFs and I want to put them back together).

I have two questions - 1.) I know that sometimes the page number is stored in the document data somewhere, as I've seen PDFs that render on Adobe as something like [1243] (10 of 150), but I've read documents of this sort into pyPDF and I can't find any information indicating the page number - where is this stored?

2.) If avenue #1 isn't available, I think I could iterate through the objects on a given page to try to find a page number - likely it would be its own object that has a single number in it. However, I can't seem to find any clear way to determine the contents of objects. If I run:

pdf.getPage(0).getContents()

This usually either returns:

{'/Filter': '/FlateDecode'}

or it returns a list of IndirectObject(num, num) objects. I don't really know what to do with either of these and there's no real documentation on it as far as I can tell. Is anyone familiar with this kind of thing that could point me in the right direction?

© Stack Overflow or respective owner

Related posts about python

unmet dependencies in Ubuntu 12.04

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I tried today to install a dvb-card on my Ubuntu 12.04 (Linux blauhai-linux 3.2.0-25-generic #40-Ubuntu SMP Wed May 23 20:30:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux ). The installation failed with an error. After that, i tried to install python (it was already installed but i got this error): linux:~$… >>> More
How can I get sikuli-ide to work?

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I installed sikuli-ide with sudo apt-get install sikuli-ide Everything was fine until I tried to start it from the terminal. I typed sikuli-ide But the only response I got was [info] locale: en_US The application was not started, furthermore there is no desktop file and sikuli-ide does not… >>> More
Getting PATH right for python after MacPorts install

as seen on Super User - Search for 'Super User'
I can't import some python libraries (PIL, psycopg2) that I just installed with MacPorts. I looked through these forums, and tried to adjust my PATH variable in $HOME/.bash_profile in order to fix this but it did not work. I added the location of PIL and psycopg2 to PATH. I know that Terminal is… >>> More
call python with system() in R to run a python script emulating the python console

as seen on Stack Overflow - Search for 'Stack Overflow'
I want to pass a chunk of Python code to Python in R with something like system('python ...'), and I'm wondering if there is an easy way to emulate the python console in this case. For example, suppose the code is "print 'hello world'", how can I get the output like this in R? >>> print… >>> More
Python - Calling a non python program from python?

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I am currently struggling to call a non python program from a python script. I have a ~1000 files that when passed through this C++ program will generate ~1000 outputs. Each output file must have a distinct name. The command I wish to run is of the form: program_name -input -output -o1 -o2… >>> More

Related posts about pypdf

Wrong values reported by pyPDF for various box regions

as seen on Stack Overflow - Search for 'Stack Overflow'
Using pyPdf, for most files I get matched results concerning various box's dimensions compared to what Acrobat reports. However for some files I get different values reported by pyPdf and Acrobat, like: pyPdf: artBox: 595.3 x 841.9 bleedBox: 595.3 x 841.9 cropBox: 595.3 x 841.9 trimBox: 517… >>> More
pyPDF - Retrieve page numbers from document

as seen on Stack Overflow - Search for 'Stack Overflow'
At the moment I'm looking into doing some PDF merging with pyPdf, but sometimes the inputs are not in the right order, so I'm looking into scraping each page for its page number to determine the order it should go in (e.g. if someone split up a book into 20 10-page PDFs and I want to put them back… >>> More
Change metadata of pdf file with pypdf.

as seen on Stack Overflow - Search for 'Stack Overflow'
Hello ! I'd like to create/modify the title of a pdf document using pypdf. It seems that the title is readonly. Is there a way to access this metadata r/w? If answer positive, a piece of code would be appreciated. Thanks >>> More
Dynamically generated PDF files working in most readers except Adobe Reader

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm trying to dynamically generate PDFs from user input, where I basically print the user input and overlay it on an existing PDF that I did not create. It works, with one major exception. Adobe Reader doesn't read it properly, on Windows or on Linux. QuickOffice on my phone doesn't read it either… >>> More
How to read line by line in pdf file using PyPdf ???

as seen on Stack Overflow - Search for 'Stack Overflow'
hi, i working with pdf files, and i have some code to read from a pdf file,, but is there a way to read line by line from the pdf file( not pages ) ,, using Pypdf,, Python 2.6 ,, on windows ??? >>> More