Character Encoding: â??

Posted by akaphenom on Stack Overflow See other posts from Stack Overflow or by akaphenom
Published on 2010-12-28T15:33:18Z Indexed on 2010/12/28 17:54 UTC
Read the original article Hit count: 470

I am trying to piece together the mysterious string of characters â?? I am seeing quite a bit of in our database - I am fairly sure this is a result of conversion between character encodings, but I am not completely positive.

The users are able to enter text (or cut and paste) into a Ext-Js rich text editor. The data is posted to a severlet which persists it to the database, and when I view it in the database i see those strange characters...

  1. is there any way to decode these back to their original meaning, if I was able to discover the correct encoding - or is there a loss of bits or bytes that has occured through the conversion process?

  2. Users are cutting and pasting from multiple versions of MS Word and PDF. Does the encoding follow where the user copied from?

Thank you


website is UTF-8 We are using ms sql server 2005;

SELECT serverproperty('Collation') -- Server default collation. Latin1_General_CI_AS

SELECT databasepropertyex('xxxx', 'Collation') -- Database default SQL_Latin1_General_CP1_CI_AS

and the column:

Column_name Type    Computed    Length  Prec    Scale   Nullable    TrimTrailingBlanks  FixedLenNullInSource    Collation
text    varchar no  -1                  yes no  yes SQL_Latin1_General_CP1_CI_AS

The non-Unicode equivalents of the nchar, nvarchar, and ntext data types in SQL Server 2000 are listed below. When Unicode data is inserted into one of these non-Unicode data type columns through a command string (otherwise known as a "language event"), SQL Server converts the data to the data type using the code page associated with the collation of the column. When a character cannot be represented on a code page, it is replaced by a question mark (?), indicating the data has been lost. Appearance of unexpected characters or question marks in your data indicates your data has been converted from Unicode to non-Unicode at some layer, and this conversion resulted in lost characters.

So this may be the root cause of the problem... and not an easy one to solve on our end.

© Stack Overflow or respective owner

Related posts about JavaScript

Related posts about sql-server