Scraping — character (long dash) error in Nokogiri
Posted
by DavidP6
on Stack Overflow
See other posts from Stack Overflow
or by DavidP6
Published on 2010-05-12T18:43:19Z
Indexed on
2010/05/12
18:44 UTC
Read the original article
Hit count: 259
nokogiri
|screen-scraping
I having trouble scraping a certain long dash that is encoded as ; on the Time magazine site. It looks like this: —. It works fine when this dash is encoded as mdash, but when the problem dash is scraped, it is returned as unknown characters. I am using Nokogiri and am wondering if I have to use some sort of special encoding? The page says it is encoded with UTF-8.
© Stack Overflow or respective owner