Characters in string changed after downloading HTML from the internet.

Posted by Callum Rogers on Stack Overflow See other posts from Stack Overflow or by Callum Rogers
Published on 2010-04-23T17:30:33Z Indexed on 2010/04/23 17:33 UTC
Read the original article Hit count: 331

Filed under:
|
|
|
|

Using the following code, I can download the HTML of a file from the internet:

WebClient wc = new WebClient();

// ....

string downloadedFile = wc.DownloadString("http://www.myurl.com/");

However, sometimes the file contains "interesting" characters like é to é, ? to ↠and ????? to フシギダãƒ.

I think it may be something to do with different unicode types or something, as each character gets changed into 2 new ones, perhaps each character being split in half but I have very little knowledge in this area. What do you think is wrong?

© Stack Overflow or respective owner

Related posts about c#

Related posts about .NET