Converting PDF portfolios to plain text (pdftotext?)
Posted
by
Andrea
on Super User
See other posts from Super User
or by Andrea
Published on 2013-11-07T02:07:09Z
Indexed on
2013/11/07
10:00 UTC
Read the original article
Hit count: 221
I am trying to convert a large number of PDFs (~15000) to plain text using pdftotext. This is working pretty well except for a few of the PDFs (~600) which, I guess, are "PDF portfolios."
When I run these PDFs through pdftotext, it just outputs:
For the best experience, open this PDF portfolio in Acrobat 9 or Adobe Reader 9, or later. Get Adobe Reader Now!
If I do open these PDFs in Adobe Reader, they look like two or more PDFs inside a single file.
Has anyone encountered this issue before? Is there any tool I can use to convert these PDFs automatically? (Either directly to text or at least to regular PDFs that pdftotext can then understand.)
© Super User or respective owner