pdftotext not outputting hebrew characters
Posted
by Ofri Raviv
on Server Fault
See other posts from Server Fault
or by Ofri Raviv
Published on 2010-05-25T21:12:33Z
Indexed on
2010/05/25
21:21 UTC
Read the original article
Hit count: 361
I'm using Xpdf's pdftotext to get the text out of some hebrew pdf files on Ubuntu.
On my local machine this worked fine. I then tried to do it on another machine and the hebrew characters don't show up in the text file. I verified that I have the language package (see below why I think so). Where else can I look for the problem?
>> tail -2 /etc/xpdf/xpdfrc
include /etc/xpdf/includes
>> cat /etc/xpdf/includes
# This file was automatically generated by /usr/sbin/update-xpdfrc.
# Instead, add or remove files in /etc/xpdf/ then run
# /usr/sbin/update-xpdfrc to regenerate this file.
include /etc/xpdf/xpdfrc-latin2
include /etc/xpdf/xpdfrc-thai
include /etc/xpdf/xpdfrc-greek
include /etc/xpdf/xpdfrc-turkish
include /etc/xpdf/xpdfrc-arabic
include /etc/xpdf/xpdfrc-hebrew
include /etc/xpdf/xpdfrc-cyrillic
>> cat /etc/xpdf/xpdfrc-hebrew
#----- begin Hebrew support package (2003-feb-16)
unicodeMap ISO-8859-8 /usr/share/xpdf/hebrew/ISO-8859-8.unicodeMap
unicodeMap Windows-1255 /usr/share/xpdf/hebrew/Windows-1255.unicodeMap
#----- end Hebrew support package
>> ls /usr/share/xpdf/hebrew/
ISO-8859-8.unicodeMap Windows-1255.unicodeMap
© Server Fault or respective owner