0

I'm storing some html-encoded data in a sql server database and I've written a script to output the data in a csv format minus the html tags and I'm getting a weird issue when html-decoding the remaining data. For example the data contains a quote character (which is html-encoded as ’), but when I try to html-decode it the data comes out as a series of weird characters (’). Does anyone know how to solve this issue? The output encoding of the page is UTF-8 if that helps.

Any advice would be much appreciated!

Cheers

Tim

3
  • Have you actually specified UTF-8 in the page? Commented Jan 14, 2011 at 14:26
  • Yes I've added the ResponseEncoding="UTF-8" attribute to the page Commented Jan 14, 2011 at 14:33
  • I've actually just found that the characters display fine in notepad but Excel 2000 seems to be replacing the chars with other weird characters! Commented Jan 14, 2011 at 15:17

2 Answers 2

3

Those 3 weird characters are how UTF-8 encodes the HTML entity ’. (They're actually the octets 0xE2 0x80 0x99, and those bytes render as "’" in your computer's default charset windows-1252.) So I don't think you've got an issue with your encoding.

It's evidently a known problem that Excel 2000 has problems with .csv files in UTF-8 encoding. The solution, bizarrely enough, is to switch the filename extension to .txt, at which point Excel 2000 will evidently import the file correctly.

Sign up to request clarification or add additional context in comments.

Comments

0

If the data is read from the CSV files, open the csv file in notepad press Save As in the fiile menu, save the file as Encoding-UTF8.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.