PHP Explode with an Unicode character as separator
Posted
by
Young Roger
on Stack Overflow
See other posts from Stack Overflow
or by Young Roger
Published on 2012-09-02T09:36:06Z
Indexed on
2012/09/02
9:37 UTC
Read the original article
Hit count: 504
XPDFs pdftotext converts pdf to text and outputs it at command line level. If needed it inserts PageBreaks between the pages as specified in TextOutputDev.cc:
eopLen = uMap->mapUnicode(0x0c, eop, sizeof(eop));
This Unicode symbol is encoding independent, -enc ASCII7
wouldn't change it. I'm currently willing to use PHP for converting and splitting the PDF file into several TXT pages for database storage. However, the following function does work, but takes twice as long as a conversion of the whole book in one time.
for($i = 1; $i <= $pages[0]; $i++)
$page[$i] = shell_exec('/usr/bin/pdftotext sample.pdf -f '.$i.' -l '.$i.' -');
How am I supposed to explode(0x0c, $wholePDF) with an Unicode character as separator? Currently, page[$i] doesn't seem to retrieve those weird Unicode PageBreak characters from the shell_exec(). I tried several headers for encoding (UTF-8 especially) but it didn't work out so far.
© Stack Overflow or respective owner