1

On my website visitors can do some inline editing. I use ajax for it with a MySQL database and PHP. I expect the Dutch language to be used on the website.

My challenge is to get the character encoding to work well.

I could use advice on:

  • the database (do i use utf-8? latin1_swedish_ci)
  • the tables in the database (i'd prefer to heve them similar to the database.)
  • the escaping to use in the ajax call (x = escape(x);)
  • the webpage character set (UTF-8? ISO-something?)
  • how this all works together.

I use nicEdit as javascript wysiwyg editor.

I could of course explain what happens whan I want to save ë and if that helps I will, but I figured it would be best to understand the matter instead of just trying to quick-fix it.

[EDIT] To elaborate:

I use these in my PHP
$input = stripslashes($input); //(if magic quotes are 'on')
$input = mysql_real_escape_string($input);
$input = strip_tags($input, '<strong><em><span><ul><ol><p><a><br><li>');

In my htmlpage:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

Javascript:
x = excape(x);

Database:
MySQL connection collation: utf8_general_ci
Table options: DEFAULT CHARSET=utf8

This is an example of what happens:

I enter (inline) the word Rëg (using 'option+u' then 'e' on my mac).
I save the word. It shows like this: R�g on the webpage.
In the database i find Rëg.

I open the editor, do nothing but save again and it shows: R%uFFFDg in the database as well as on the page. After that it does not change anymore.

Any help is greatly appreciated.
Kim

6
  • Commenting because it's not a complete answer, but: The escaping prior to database entry must be done server-side, not client-side. You can't trust anything coming from the client, even if you've put validation on the client-side. Commented Sep 9, 2010 at 7:37
  • 2
    Yes, utf-8 is a way to go. The rest can be solved eventually. Commented Sep 9, 2010 at 7:38
  • if you want to understand the matter, do not make a big mess of completely different matters. Database is one matter, HTML is other and AJAX is another. Take each one and figure out separately Commented Sep 9, 2010 at 7:41
  • ë is not weird and do not require any escaping. The only issue you may experience is AJAX response which can be easily decoded. Commented Sep 9, 2010 at 7:48
  • @T.J. Crowder Thanks for your remark, i indeed make use of some verification before putting stuff in my dB Commented Sep 9, 2010 at 10:52

2 Answers 2

1

It shows like this: R�g on the webpage.

You need to instruct the webbrowser that you're displaying the webpage in UTF-8 and that it should interpret it as the same. Add the following to top of your PHP, before emitting any character to the output:

header('Content-Type: text/html; charset=utf-8');

Only the <meta> tag is not enough. This is not used by the webbrowser. It's the response header which counts. By the way, Javascript's escape() function is deprecated.

See also:

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks BalusC. I added the line just below the session_start() but still the ë is shown as diamond with questionmark. And thatnks for the hint on deprecated function, i'll replace it.
And thanks again, BelusC :-) Replacing escape with encodeURIComponent did the trick.
Oh, you're submitting the form using JS? Btw, it's BalusC with an a, not e. And indeed, to mark a problem solved, you don't need to add some yelling to the question title, but just mark the most helpful answer accepted :) See also stackoverflow.com/faq
LOL (i mean 'lol') i wasn't jelling, i was cheering from happiness, BalusC. Check is added, I'll change the title again...
0

Just use UTF-8 for everything, and normally it will just work.

3 Comments

Hey Reinis, i do not know how to use UTF-8 for everything. I believe i read somewhere that javascript uses some ISO coding. How do i change that?
You can add a charset="utf-8" attribute to your external script elements to load them as utf-8. It's only required if you have unescaped string literals, though. CSS allows you to use @charset 'utf-8';, but this too is only required if you have unescaped string literals in your CSS, and it's rare. As for everything else, just set it to use UTF-8. Use an utf8 collation in MySQL, a Content-Type: text/html;charset=utf-8 header for HTML etc.
Thank you Reinis, i did all that already, as i described in my post above (the edited in part). It turned out that i used the wrong escape thingy in my javascript. Bah. But thanks again for your elaborate answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.