Update a PDF to include an encrypted, hidden, unique identifier?

Posted by Dave Jarvis on Stack Overflow See other posts from Stack Overflow or by Dave Jarvis
Published on 2010-12-27T02:40:47Z Indexed on 2010/12/27 2:54 UTC
Read the original article Hit count: 354

Background

The idea is this:

  • Person provides contact information for online book purchase
  • Book, as a PDF, is marked with a unique hash
  • Person downloads book

PDF passwords are annoying and extremely easy to circumvent.

The ideal process would be something like:

  • Generate hash based on contact information
  • Store contact information and hash in database
  • Acquire book lock
  • Update an "include" file with hash text
  • Generate book as PDF (using pdflatex)
  • Apply hash to book
  • Release book lock
  • Send email with book download link

Technologies

The following technologies can be used (other programming languages are possible, but libraries will likely be limited to those supplied by the host):

  • C, Java, PHP
  • LaTeX files
  • PDF files
  • Linux

Question

What programming techniques (or open source software) should I investigate to:

  • Embed a unique hash (or other mark) to a PDF
  • Create a collusion-attack resistant mark
  • Develop a non-fragile (e.g., PDF -> EPS -> PDF still contains the mark) solution

Research

I have looked at the following possibilities:

  • Steganography
  • Natural Language Processing (NLP)
  • Convert blank pages in PDF to images; mark those images; reassemble PDF
  • LaTeX watermark package
  • ImageMagick

Steganograhy requires keeping a master copy of the images, and I'm not sure if the watermark would survive PDF -> EPS -> PDF, or other types of conversion. LaTeX creates an image cache, so any steganographic process would have to intercept that process somehow. NLP introduces grammatical errors. Inserting blank pages as images is immediately suspect; it is easy to replace suspicious blank pages. The LaTeX watermark package draws visible marks. ImageMagick draws visible marks.

What other solutions are possible?

Related Links

Thank you!

© Stack Overflow or respective owner

Related posts about copyright

Related posts about pdflatex