getURL, parsing web-site with german special characters
Posted
by
Kay
on Stack Overflow
See other posts from Stack Overflow
or by Kay
Published on 2012-06-08T22:22:15Z
Indexed on
2012/06/08
22:40 UTC
Read the original article
Hit count: 208
I am using getURL() and htmlParse() - how can I make web-site content with special characters to be displayed properly?
library(RCurl); library(XML)
script <- getURL("http://www.floraweb.de/pflanzenarten/foto.xsql?suchnr=814")
doc <- htmlParse(script, encoding = "UTF-8")
xpathSApply(doc, "//div[@id='content']//p", xmlValue)[2]
[1] "Bellis perennis L., Gänseblümchen"
# should say:
[1] "Bellis perennis L., Gänseblümchen"
> Sys.getlocale()
[1] "LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252"
© Stack Overflow or respective owner