Characters in string changed after downloading HTML from the internet.
Posted
by Callum Rogers
on Stack Overflow
See other posts from Stack Overflow
or by Callum Rogers
Published on 2010-04-23T17:30:33Z
Indexed on
2010/04/23
17:33 UTC
Read the original article
Hit count: 331
Using the following code, I can download the HTML of a file from the internet:
WebClient wc = new WebClient();
// ....
string downloadedFile = wc.DownloadString("http://www.myurl.com/");
However, sometimes the file contains "interesting" characters like é
to é
, ?
to â†
and ?????
to フシギダãƒ
.
I think it may be something to do with different unicode types or something, as each character gets changed into 2 new ones, perhaps each character being split in half but I have very little knowledge in this area. What do you think is wrong?
© Stack Overflow or respective owner