How to unmangle PDF format into a usable text or spreadsheet document?
- by Chuck
Upon requesting some daily/hourly sales data from a coworker who is responsible for such requests, I was given a series of PDF files. The point of sale program that is used, for some reason, answers requests for this type of information in the form of PDF files.
The issue:
The PDF files look to be in a format that should easily be copy and pasted into a spreadsheet. There are three columns that look to be neatly organized across two pages. When copy/pasting the first page, all three columns from the PDF's first page are dumped into a single column consisting of the Date followed by the Hours for the transactions on that day. The end of this Date/Time information is followed by all of the Total Sales values that should be attached a Date and Time of the transaction. (NOTE: There are no duplicated Dates in the Date column, ie, Multiple transactions for a day only have one yyyy/mm/dd listed for the first row but not the following rows.)
While it was a huge pain, it was possible to, in about four or five steps, get the single column of data broken out into three columns that matched the PDF.
The second page of the PDF file, when attempting to copy/paste into a spreadsheet, creates a single column with the first third of the cells being the Dates from the PDF, the second third of the cells being the Hours of the transactions and the final third of the cells being filled with the Total Sales.
After the copy/paste there is no way to figure out which Hours belong to which Dates or Total Sales due to the lack of the duplicated Dates in the Date column as mentioned above.
My PDF-fu is next to non-existent. I've just now started to work with PDF editors and some www.convertmyPDFforfree.com websites, so far, with absolutely nothing remotely coming anywhere near usable output. (Both methods have so far done nothing but product blank documents.)
Before I go back and pester my co-worker into figuring out a way to create a report in some other format than PDF, is there any method by which to take the data that looks to be formatted correctly in a PDF and copy/paste it into a spreadsheet that will look the same?
I appreciate any help that can be made available. The sales data isn't so sensitive that I couldn't part with a bit to let somebody actually see what it is that needs to be dealt with, just let me know. The PDF's are less than 100kb each so sending them shouldn't be a burden to any interested party.