0

I want to take a string and store it in a MYSQL db. This string will be a HTML string and it can have any character encoding or be written in any language.

How can I safely save this in my MYSQL DB without affecting the HTML string so that I can later retrieve it as it is?

In addition, the field it will be stored in is of data type text and has a collation of latin1_swedish_ci will this effect it in anyway?

I am currently doing this:

htmlentities($html, ENT_QUOTES, 'UTF-8')

But I don't think the above will work for all character sets. I mean how will German or Japanese characters be affected?

Thanks for any help.

3
  • If storage isn't an issue, you could always base64_encode() it before stuffing it in the DB. You won't have to use any Unicode on the DB end, just standard ASCII. (Posting this as a comment because I wouldn't call it a great answer and don't expect it to be taken as one.) Commented Mar 16, 2011 at 22:33
  • Ok, so it appears using base64 wasn't as dumb an idea as I had originally thought. I'm glad I don't take points too seriously. ;) Commented Mar 16, 2011 at 22:36
  • Hmm, interesting I never thought of base64 encoding! It makes things safe and I can get it back as it was before! So this won't effect German or Japanese characters etc. It will just appear as before? I'll probably have to worry about how I output back into the HTML page and its encoding. Commented Mar 16, 2011 at 22:43

4 Answers 4

2

Why not base64 encode it for storage, and then decode it after?

Sign up to request clarification or add additional context in comments.

Comments

0

You could store it in a BLOB field and MySQL will never ever try to convert it. But that means that you have remember the the encoding you used when saving the string.

Another option is to encode the string as base64.

Comments

0

I don't think the collation won't have an effect on the storage of values. It would only affect the behaviour for when you do things like comparisons (WHERE) and sorting (ORDER BY).

IMHO, the safest way to ensure your data is unaltered would be to store the values as Binary. Base64 would also work. In either case, you would have to know the character encoding when reading it back out though.

Comments

0

Interesting everyone is suggesting base64, I never thought about doing it that way. I know a lot of CMS databases I've used just use utf-8 character encoding. This will support your germany and japanese characters. The HTML shouldn't get affected, and will render in the browser fine as long as the HTML charset is also utf-8 charset=utf-8

1 Comment

IMO the main idea is to be as unobtrusive to his existing setup as possible without causing major code/performance issues... he might not have control over the collation, or maybe it needs to be latin1 for some random reason. :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.