0

I have the following UTF-8 file exported from a microsoft access file

http://www.yousendit.com/download/TTZtT214SU84Q1FLSkE9PQ

I have ensured my mysql database is utf8 with the status; command for both client and server. I insert the above file into my database with the following command:

LOAD DATA LOCAL INFILE 'tblAuction1.txt' INTO TABLE Auctions FIELDS TERMINATED BY ';' ENCLOSED BY '"' ESCAPED BY '\\';

All seems to be going kind of OK, unicode characters are displayed in the html as they should be as far as I can tell. The direct contents of the database field is here:

http://www.nomorepasting.com/getpaste.php?pasteid=22622

However the resulting html code is displayed:

http://www.nomorepasting.com/getpaste.php?pasteid=22617

Which displays as

Fee Listing

1.00 
<\/OBJECT>
');\n\t\t<\/SCRIPT>\n\t\t

in the browser

The code I am using to show this is:

http://www.nomorepasting.com/getpaste.php?pasteid=22618

which was working fine before I changed the encoding.

as a side question, I am wondering why changing from tab delimited to semicolon delimited, and enclosing fields would ddecrease the size of the exported file by half. The tab character is a single character just like the ; character, and adding quotes to enclose should have increased the size?

1 Answer 1

1

Depending on the configuration of the web server you may need to explicitly set the encoding to "text/html; charset=UTF-8", with header():

header('Content-Type: text/html; charset=UTF-8');

This should be enough for your specific problem, but - in case you also intend to manipulate the strings - note that PHP contains many functions that are not safe to use with multi-byte characters: you should at least properly configure the mbstring extension.

I also have this cheatsheet in my bookmarks, I think it's still relevant.

Sign up to request clarification or add additional context in comments.

2 Comments

That did not seem to fix anything, is it possibly a problem with the database? It seems to be a problem with the html meant to be passed to document.write, and a tag unclosed somewhere.
The unclosed tag is orthogonal to the UTF-8 encoding... I though you had issues displaying correctly the non-ASCII characters.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.