How can I get Perl to detect the bad UTF-8 sequences?

Posted by gorilla on Stack Overflow See other posts from Stack Overflow or by gorilla
Published on 2010-04-16T22:20:18Z Indexed on 2010/04/17 13:03 UTC
Read the original article Hit count: 236

Filed under:
|

I'm running Perl 5.10.0 and Postgres 8.4.3, and strings into a database, which is behind a DBIx::Class.

These strings should be in UTF-8, and therefore my database is running in UTF-8. Unfortunatly some of these strings are bad, containing malformed UTF-8, so when I run it I'm getting an exception

DBI Exception: DBD::Pg::st execute failed: ERROR: invalid byte sequence for encoding "UTF8": 0xb5

I thought that I could simply ignore the invalid ones, and worry about the malformed UTF-8 later, so using this code, it should flag and ignore the bad titles.

if(not utf8::valid($title)){
   $title="Invalid UTF-8";
}
$data->title($title);
$data->update();

However Perl seems to think that the strings are valid, but it still throws the exceptions.

How can I get Perl to detect the bad UTF-8?

© Stack Overflow or respective owner

Related posts about perl

Related posts about unicode