Encoding::UndefinedConversionError from email body

Posted by raam86 on Stack Overflow See other posts from Stack Overflow or by raam86
Published on 2012-11-02T21:57:44Z Indexed on 2012/11/02 23:01 UTC
Read the original article Hit count: 528

Filed under:
|
|
|
|

using mail for ruby I am getting this message:

mail.rb:22:in `encode': "\xC7" from ASCII-8BIT to UTF-8 (Encoding::UndefinedConversionError)
    from mail.rb:22:in `<main>'

If I remove encode I get a message ruby

/var/lib/gems/1.9.1/gems/bson-1.7.0/lib/bson/bson_ruby.rb:63:in `rescue in to_utf8_binary': String not valid utf-8: "<div dir=\"ltr\"><div class=\"gmail_quote\">l<br><br><br><div dir=\"ltr\"><div class=\"gmail_quote\"><br><br><br><div dir=\"ltr\"><div class=\"gmail_quote\"><br><br><br><div dir=\"ltr\"><div dir=\"rtl\">\xC7\xE1\xE4\xD5 \xC8\xC7\xE1\xE1\xDB\xC9 \xC7\xE1\xDA\xD1\xC8\xED\xC9</div></div>\r\n</div><br></div>\r\n</div><br></div>\r\n</div><br></div>" (BSON::InvalidStringEncoding)

This is my code:

require 'mail'
require 'mongo'

connection = Mongo::Connection.new
db = connection.db("DB")
db = Mongo::Connection.new.db("DB")
newsCollection = db["news"]

Mail.defaults do
  retriever_method :pop3, :address    => "pop.gmail.com",
                          :port       => 995,
                          :user_name  => 'my_username',
                          :password   => '*****',
                          :enable_ssl => true
end
emails = Mail.last
#Checks if email is multipart and decods accordingly. Put to extract UTF8 from body
plain_part = emails.multipart? ? (emails.text_part ? emails.text_part.body.decoded : nil) : emails.body.decoded

html_part = emails.html_part ? emails.html_part.body.decoded : nil

mongoMessage = {"date" => emails.date.to_s , "subject" => emails.subject , "body" => plain_part.encode('UTF-8') }
msgID = newsCollection.insert(mongoMessage) #add the document to the database and returns it's ID
puts msgID

For English and Hebrew it works perfectly but it seems gmail is sending arabic with different encoding. Replacing UTF-8 with ASCII-8BIT gives a similar error.

I get the same result when using plain_part for plain email messages. I am handling emails from one specific source so I can put html_part with confidence it's not causing the error. To make it extra weird Subject in Arabic is rendered perfectly. What encoding should I use?

© Stack Overflow or respective owner

Related posts about ruby

Related posts about mongodb