Search Results

Search found 8 results on 1 pages for 'mojibake'.

Page 1/1 | 1 

  • HTML::TreeBuilder has mojibake problem, it shows wired chars in the output

    - by varun_vijay_r
    use strict; use WWW::Curl::Easy; use HTML::TreeBuilder; my $cookie_file ='/tmp/pcook'; my $curl = new WWW::Curl::Easy; my $response_body; my $charset = 'utf-8'; $DocOffline::charset = undef; $curl-setopt (CURLOPT_URL, 'http://www.breitbart.com/article.php?id=D9G7CR5O0&show_article=1'); $curl-setopt ( CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.9 (KHTML, like Gecko) Chrome/6.0.400.0 Safari/533.9'); $curl-setopt ( CURLOPT_HEADER, 0); $curl-setopt ( CURLOPT_FOLLOWLOCATION, 1); $curl-setopt ( CURLOPT_AUTOREFERER, 1); $curl-setopt ( CURLOPT_SSL_VERIFYPEER, 0); $curl-setopt ( CURLOPT_COOKIEFILE, $cookie_file); $curl-setopt ( CURLOPT_COOKIEJAR, $cookie_file); $curl-setopt ( CURLOPT_REFERER, 'http://www.iavian.com/docOff/'); $curl-setopt ( CURLOPT_HEADERFUNCTION, \&headerCallback ); open (my $fileb, "", \$response_body); $curl-setopt(CURLOPT_WRITEDATA,$fileb); my $retcode = $curl-perform; if ($retcode == 0) { my $dom_tree = HTML::TreeBuilder-new(); $dom_tree-ignore_elements(qw(script style)); $dom_tree-utf8_mode(1); $dom_tree-parse($response_body); $dom_tree-eof(); print $dom_tree-as_HTML('<&', ' ', {}); } else { print("An error happened: ".$curl-strerror($retcode)." ($retcode)\n"); } sub headerCallback { my($data, $pointer) = @_; $data =~ m/Content-Type:\s*.*;\s*charset=(.*)/; if ($1) { $charset = $1; $charset =~ s/[^a-zA-Z0-9_-]*//g; } return length($data); }

    Read the article

  • search PDFs with non-standard character encodings

    - by Hugh Allen
    Some PDF files produce garbage ("mojibake") when you copy text. This makes it impossible to search them (whatever you search for will not match the garbage). Does anyone have an easy workaround? An example: TEAC TV manual EU2816STF BTW: I am using Adobe Reader - perhaps an alternative viewer might help?

    Read the article

  • Wrong file encoding after Dist::Zilla

    - by xenoterracide
    How can I get mojibake to pass? this might be a bug in the contributors plugin. The character does not render correctly in perldoc, but does in my vim and in the extracted git log. # Failed test 'Mojibake test for blib/lib/Pod/Spell.pm' # at /home/xenoterracide/perl5/perlbrew/perls/perl-5.18.1/lib/site_perl/5.18.1/Test/Mojibake.pm line 168. # Non-UTF-8 unexpected in blib/lib/Pod/Spell.pm, line 431 (POD) here's a snippet from the source which should probably be looked at directly due to copy-paste maybe not catching an encoding issue. =item * Olivier Mengué <[email protected]> =back A little more vim exploration shows that :set filencoding is being changed to latin1 editing the file in vim seems to fix this, but since the file is being generated, how can I get it generated with the correct encoding?

    Read the article

  • Is there any free host which supports php and mySQL in utf-8? [closed]

    - by Maria Konnou
    Possible Duplicate: How to find web hosting that meets my requirements? Is there any free host which supports php and mySQL queries in utf-8? I've already tried to use x10hosting and 000webhosting, but they don't support utf8 mysql queries (got mojibake). The default encoding of mysql in both sites is latin-1, and you're not able to change that. Is there any other free host that fully supports utf-8?

    Read the article

  • What issues lead people to use Japanese-specific encodings rather than Unicode?

    - by Nicolas Raoul
    At work I come across a lot of Japanese text files in Shift-JIS and other encodings. It causes many mojibake (unreadable character) problems for all computer users. Unicode was intended to solve this sort of problem by defining a single character set for all languages, and the UTF-8 serialization is recommended for use on the Internet. So why doesn't everybody switch from Japanese-specific encodings to UTF-8? What issues with or disadvantages of UTF-8 are holding people back? EDIT: The W3C lists some known problems with Unicode, could this be a reason too?

    Read the article

1