Character Encoding: â??
Posted
by
akaphenom
on Stack Overflow
See other posts from Stack Overflow
or by akaphenom
Published on 2010-12-28T15:33:18Z
Indexed on
2010/12/28
17:54 UTC
Read the original article
Hit count: 477
I am trying to piece together the mysterious string of characters â?? I am seeing quite a bit of in our database - I am fairly sure this is a result of conversion between character encodings, but I am not completely positive.
The users are able to enter text (or cut and paste) into a Ext-Js rich text editor. The data is posted to a severlet which persists it to the database, and when I view it in the database i see those strange characters...
is there any way to decode these back to their original meaning, if I was able to discover the correct encoding - or is there a loss of bits or bytes that has occured through the conversion process?
Users are cutting and pasting from multiple versions of MS Word and PDF. Does the encoding follow where the user copied from?
Thank you
website is UTF-8 We are using ms sql server 2005;
SELECT serverproperty('Collation') -- Server default collation. Latin1_General_CI_AS
SELECT databasepropertyex('xxxx', 'Collation') -- Database default SQL_Latin1_General_CP1_CI_AS
and the column:
Column_name Type Computed Length Prec Scale Nullable TrimTrailingBlanks FixedLenNullInSource Collation
text varchar no -1 yes no yes SQL_Latin1_General_CP1_CI_AS
The non-Unicode equivalents of the nchar, nvarchar, and ntext data types in SQL Server 2000 are listed below. When Unicode data is inserted into one of these non-Unicode data type columns through a command string (otherwise known as a "language event"), SQL Server converts the data to the data type using the code page associated with the collation of the column. When a character cannot be represented on a code page, it is replaced by a question mark (?), indicating the data has been lost. Appearance of unexpected characters or question marks in your data indicates your data has been converted from Unicode to non-Unicode at some layer, and this conversion resulted in lost characters.
So this may be the root cause of the problem... and not an easy one to solve on our end.
© Stack Overflow or respective owner