How to copy text out of a PDF without losing formatting?
Posted
by
Colen
on Super User
See other posts from Super User
or by Colen
Published on 2010-10-11T21:13:58Z
Indexed on
2012/12/02
11:10 UTC
Read the original article
Hit count: 227
When I copy text out of a PDF file and into a text editor, it ends up mangled in a variety of ways. Formatting like bold and italics are lost; soft line breaks within a paragraph of text are converted to hard line breaks; dashes to break a word over two lines are preserved even when they shouldn't be; and single and double quotes are replaced with ? signs.
Ideally, I'd like to be able to copy text from a PDF and have formatting converted to HTML codes, "smart quotes" converted to " and ', and line breaks done properly. Is there any way to do this?
© Super User or respective owner