getURL, parsing web-site with german special characters

Posted by Kay on Stack Overflow See other posts from Stack Overflow or by Kay
Published on 2012-06-08T22:22:15Z Indexed on 2012/06/08 22:40 UTC
Read the original article Hit count: 208

Filed under:
|

I am using getURL() and htmlParse() - how can I make web-site content with special characters to be displayed properly?

library(RCurl); library(XML)
script <- getURL("http://www.floraweb.de/pflanzenarten/foto.xsql?suchnr=814")
doc <- htmlParse(script, encoding = "UTF-8")
xpathSApply(doc, "//div[@id='content']//p", xmlValue)[2]
[1] "Bellis perennis L., Gänseblümchen"
# should say:
[1] "Bellis perennis L., Gänseblümchen"

> Sys.getlocale()
[1] "LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252"

© Stack Overflow or respective owner

Related posts about r

    Related posts about geturl