Update a PDF to include an encrypted, hidden, unique identifier?
- by Dave Jarvis
Background
The idea is this:
Person provides contact information for online book purchase
Book, as a PDF, is marked with a unique hash
Person downloads book
PDF passwords are annoying and extremely easy to circumvent.
The ideal process would be something like:
Generate hash based on contact information
Store contact information and hash in database
Acquire book lock
Update an "include" file with hash text
Generate book as PDF (using pdflatex)
Apply hash to book
Release book lock
Send email with book download link
Technologies
The following technologies can be used (other programming languages are possible, but libraries will likely be limited to those supplied by the host):
C, Java, PHP
LaTeX files
PDF files
Linux
Question
What programming techniques (or open source software) should I investigate to:
Embed a unique hash (or other mark) to a PDF
Create a collusion-attack resistant mark
Develop a non-fragile (e.g., PDF -> EPS -> PDF still contains the mark) solution
Research
I have looked at the following possibilities:
Steganography
Natural Language Processing (NLP)
Convert blank pages in PDF to images; mark those images; reassemble PDF
LaTeX watermark package
ImageMagick
Steganograhy requires keeping a master copy of the images, and I'm not sure if the watermark would survive PDF -> EPS -> PDF, or other types of conversion. LaTeX creates an image cache, so any steganographic process would have to intercept that process somehow. NLP introduces grammatical errors. Inserting blank pages as images is immediately suspect; it is easy to replace suspicious blank pages. The LaTeX watermark package draws visible marks. ImageMagick draws visible marks.
What other solutions are possible?
Related Links
http://www.tcpdf.org/
invisible watermarks in images
Thank you!