mojibake - Developer IT

HTML::TreeBuilder has mojibake problem, it shows wired chars in the output

- by varun_vijay_r

use strict; use WWW::Curl::Easy; use HTML::TreeBuilder; my $cookie_file ='/tmp/pcook'; my $curl = new WWW::Curl::Easy; my $response_body; my $charset = 'utf-8'; $DocOffline::charset = undef; $curl-setopt (CURLOPT_URL, 'http://www.breitbart.com/article.php?id=D9G7CR5O0&show_article=1'); $curl-setopt ( CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.9 (KHTML, like Gecko) Chrome/6.0.400.0 Safari/533.9'); $curl-setopt ( CURLOPT_HEADER, 0); $curl-setopt ( CURLOPT_FOLLOWLOCATION, 1); $curl-setopt ( CURLOPT_AUTOREFERER, 1); $curl-setopt ( CURLOPT_SSL_VERIFYPEER, 0); $curl-setopt ( CURLOPT_COOKIEFILE, $cookie_file); $curl-setopt ( CURLOPT_COOKIEJAR, $cookie_file); $curl-setopt ( CURLOPT_REFERER, 'http://www.iavian.com/docOff/'); $curl-setopt ( CURLOPT_HEADERFUNCTION, \&headerCallback ); open (my $fileb, "", \$response_body); $curl-setopt(CURLOPT_WRITEDATA,$fileb); my $retcode = $curl-perform; if ($retcode == 0) { my $dom_tree = HTML::TreeBuilder-new(); $dom_tree-ignore_elements(qw(script style)); $dom_tree-utf8_mode(1); $dom_tree-parse($response_body); $dom_tree-eof(); print $dom_tree-as_HTML('<&', ' ', {}); } else { print("An error happened: ".$curl-strerror($retcode)." ($retcode)\n"); } sub headerCallback { my($data, $pointer) = @_; $data =~ m/Content-Type:\s*.*;\s*charset=(.*)/; if ($1) { $charset = $1; $charset =~ s/[^a-zA-Z0-9_-]*//g; } return length($data); }

Read the article

search PDFs with non-standard character encodings

- by Hugh Allen

Some PDF files produce garbage ("mojibake") when you copy text. This makes it impossible to search them (whatever you search for will not match the garbage). Does anyone have an easy workaround? An example: TEAC TV manual EU2816STF BTW: I am using Adobe Reader - perhaps an alternative viewer might help?

Read the article

PHP Strange character before £ sign?

- by tarnfeld

For some reason i get a Â£76756687 weird character when i type a £ into a text field on my form?

Read the article

Getting â€™ instead of an apostrophe(') in PHP

- by Mint

I'v tried converting the text to or from utf8… didn't seem to help Im getting: "Itâ€™s Getting the Best of Me" It should be: "It’s Getting the Best of Me" Im getting this data from a url - http://www.tvrage.com/quickinfo.php?show=Surviver&ep=20x02&exact=0

Read the article

Unexpected output of std::wcout << L"élève"; in Windows Shell

- by chmike

While testing some functions to convert strings between wchar_t and utf8 I met the following weird result with Visual C++ express 2008 std::wcout << L"élève" << std::endl; prints out "ÚlÞve:" which is obviously not what is expected. This is obviously a bug. How can that be ? How am I suppose to deal with such "feature" ?

Read the article

Wrong file encoding after Dist::Zilla

- by xenoterracide

How can I get mojibake to pass? this might be a bug in the contributors plugin. The character does not render correctly in perldoc, but does in my vim and in the extracted git log. # Failed test 'Mojibake test for blib/lib/Pod/Spell.pm' # at /home/xenoterracide/perl5/perlbrew/perls/perl-5.18.1/lib/site_perl/5.18.1/Test/Mojibake.pm line 168. # Non-UTF-8 unexpected in blib/lib/Pod/Spell.pm, line 431 (POD) here's a snippet from the source which should probably be looked at directly due to copy-paste maybe not catching an encoding issue. =item * Olivier Mengué <[email protected]> =back A little more vim exploration shows that :set filencoding is being changed to latin1 editing the file in vim seems to fix this, but since the file is being generated, how can I get it generated with the correct encoding?

Read the article

Is there any free host which supports php and mySQL in utf-8? [closed]

- by Maria Konnou

Possible Duplicate: How to find web hosting that meets my requirements? Is there any free host which supports php and mySQL queries in utf-8? I've already tried to use x10hosting and 000webhosting, but they don't support utf8 mysql queries (got mojibake). The default encoding of mysql in both sites is latin-1, and you're not able to change that. Is there any other free host that fully supports utf-8?

Read the article

What issues lead people to use Japanese-specific encodings rather than Unicode?

- by Nicolas Raoul

At work I come across a lot of Japanese text files in Shift-JIS and other encodings. It causes many mojibake (unreadable character) problems for all computer users. Unicode was intended to solve this sort of problem by defining a single character set for all languages, and the UTF-8 serialization is recommended for use on the Internet. So why doesn't everybody switch from Japanese-specific encodings to UTF-8? What issues with or disadvantages of UTF-8 are holding people back? EDIT: The W3C lists some known problems with Unicode, could this be a reason too?

Developer IT