Converting PDF portfolios to plain text (pdftotext?)

Posted by Andrea on Super User See other posts from Super User or by Andrea
Published on 2013-11-07T02:07:09Z Indexed on 2013/11/07 10:00 UTC
Read the original article Hit count: 221

Filed under:

I am trying to convert a large number of PDFs (~15000) to plain text using pdftotext. This is working pretty well except for a few of the PDFs (~600) which, I guess, are "PDF portfolios."

When I run these PDFs through pdftotext, it just outputs:

For the best experience, open this PDF portfolio in Acrobat 9 or Adobe Reader 9, or later. Get Adobe Reader Now!

If I do open these PDFs in Adobe Reader, they look like two or more PDFs inside a single file.

Has anyone encountered this issue before? Is there any tool I can use to convert these PDFs automatically? (Either directly to text or at least to regular PDFs that pdftotext can then understand.)

© Super User or respective owner

Related posts about pdf