7

I have been stuck for days on exporting UTF-8 CSV with chinese characters that shows garbled text on Windows Excel. I am using PHP and have already added the BOM byte mark and tried encoding but no luck at all.

They open fine on Notepad++, Google Spreadsheet and even on Mac Numbers. But not on Excel which is a requirement by the client. When opening with Notepad++ the encoding is shown as UTF-8. If I change it to UTF-8 manually and save, the file opens fine on Excel.

It seems as though the BOM byte mark doesn't get saved in the output as Notepad++ always detect it as UTF-8 without BOM.

Also, the CSV is not saved on server. Data is retrieved from DB and then exported directly out.

Here are my codes:

// Setup headers
header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
header('Content-Description: File Transfer');
header("Content-type: text/csv");
header("Content-disposition: filename=".$filename.".csv");
header("Pragma: no-cache");

// First Method
$fp = fopen('php://output', 'w');
// Add BOM to fix UTF-8 in Excel, but doesn't work
fputs($fp, chr(0xEF) . chr(0xBB) . chr(0xBF) );

if ($fp) {

    fputcsv($fp, array("Header"), ",");
    fputcsv($fp, array($string_with_chinese_chars), ",");
}

fclose($fp);
exit();

// Second Method
$csv = "";
$sep = ",";
$newline = "\n"; // Also tried with PHP_EOL

$csv .= "Header";
$csv .= $newline;
$csv .= $string_with_chinese_chars;
$csv .= $newline;

// Tried all the below ways but doesn't work.
// Method 2.1
print chr(255) . chr(254) . mb_convert_encoding($csv, 'UTF-16LE', 'UTF-8');

// Method 2.2
print chr(239) . chr(187) . chr(191) . $csv;

// Method 2.3
print chr(0xEF).chr(0xBB).chr(0xBF);
print $newline;
print $csv;
3
  • Can you open the file in a hex editor before and after saving it in Notepad++, and see what is the difference? And maybe even add the hex dump of the file into your question, if it's short enough? Commented Nov 6, 2016 at 8:47
  • Ok, I will give it a try and update again. Commented Nov 6, 2016 at 9:49
  • 1
    Update: The downloaded file HEX starts with 0A EF BB BF. While the file after saving with Notepad++ starts with EF BB BF 0A EF BB BF. 0A looks to be a new line. Somehow that seems to be added at the start of the file even though there wasn't any part of the codes doing that. This is a shared hosting server and I don't have access to the php.ini. Commented Nov 6, 2016 at 10:16

6 Answers 6

8

Hope this can help someone. What worked for me was I had to put both:

...
echo chr(0xEF) . chr(0xBB) . chr(0xBF);
$file = fopen('php://output', 'w');
fputs($file, chr(0xEF) . chr(0xBB) . chr(0xBF));
...

I'm not an expert in PHP so I can't explain why this works by I hope this helps someone because I had a hard time also solving this problem.

Sign up to request clarification or add additional context in comments.

4 Comments

No idea why this works but it does. Note that it seems like it just needs the bom twice, it can be repeated on the same line.
I've spent a day on this, thank you man! This makes me think it has to do with writing into php output rather than directly in a file, as there it works without duplication. Would be nice if someone could explain it.
It works. Generated the file in a Slim application. Added the bom both in the response body as well inside the file
Strangely, this alleviated my problem as well.
5

Based on your comment above, it looks like your script is accidentally printing out a newline (hex 0A) before the UTF-8 BOM, causing Excel not to recognize the output as UTF-8.

Since you're using PHP, make sure that there's no empty line before the <?php marker in your script, or in any other PHP file that it might include. Also make sure that none of the files you include has any whitespace after the closing ?> marker, if there is one.

In practice, this can be quite hard to do, since many text editors insist on always appending a newline to the end of the last line. Thus, the safest and easiest solution is to simply leave out the ?> marker from your PHP files, unless you intend to print out whatever comes after it. PHP does not require the ?> to be present, and using it in files that are not meant to be mixed PHP and literal template HTML (or other text) is just asking for bugs like this.

7 Comments

I may have forgotten to add an important point which is that the codes are part of a plugin I developed for Wordpress. I am very sure my plugin does not have a whitespace or empty line before the <?php marker. I also tried to do a ob_start() and print out whatever I have above and do a ob_flush(). The 0A character is still there. Does this mean I need to investigate each file in Wordpress for this?
That may indeed be necessary. The first thing I'd do is grep all the PHP files for ?>, and remove it from any files that don't need it (i.e. which don't have any actual HTML or other content after it). If that doesn't fix it, it also shouldn't be hard to write a script to loop through all the files and check that the first five bytes of each of them are <?php. You might also be able to work around this issue using output buffering (i.e. catching the stray newline in a buffer and throwing it away), if you can get an ob_start() in early enough. But I'd only do that as a last resort.
That sounds like a lot of work. I'll give that a try. Would it also be a possibility that a configuration in php.ini causes this? I'll probably have to write an independent PHP script and see if the same thing happens.
If you happen to have perl installed, perl -0777 -nE 'say $ARGV if /^\s+<\?/ or /\?>\s+$/' *.php should list any .php files in the current directory that have whitespace (and nothing else) before the first <? or after the last ?>. To search subdirectories too, try e.g. find . -name '*.php' -exec perl -0777 -nE 'say $ARGV if /^\s+<\?/ or /\?>\s+$/' '{}' ';'. (That should work with bash, perl and GNU find, which are installed by default on Linux, but are available for Windows too, e.g. via Git for Windows.)
I will try the perl script. Meanwhile, I've used ob_start() but it seems the character is still there. How can I use ob_start to erase anything in the buffer and start fresh?
|
4

Below code worked for me. Output utf-8-bom characters before csv content:

  echo "\xEF\xBB\xBF"; // utf-8 bom 
  echo $csv;

1 Comment

Thanks! I was writing European characters to output without BOM, this solution worked fine, now in Excel I see names like Reneé displayed correctly
3

I usually do it this way:

header('Content-Type: application/csv');
header('Content-Disposition: attachment; filename="filename.csv"');
header('Cache-Control: max-age=0');

// BOM header UTF-8
echo "\xEF\xBB\xBF";

$fh = @fopen('php://output', 'w');

...

And I use the ; as separator as excel most likely doesn't auto-format the ,

1 Comment

I tried that as well but it doesn't work. Also, just to add on, if I save the file to server and download from server directly, it works.
0

Thanks, @Ilmari Karonen for the explanation. Indeed it sometimes adds some empty lines in the buffer. This worked for me. Clearing all output from the buffer and then adding the BOM. I had no idea from which file or plugin the empty space or new line was being added.

// Open CSV file for writing
$fp = fopen($file_path, 'w');
// Discard all active output buffers
while (ob_get_level()) {
    ob_end_clean();
}
ob_start();
// Write the UTF-8 byte order mark (BOM) to the CSV file
fwrite($fp, "\xEF\xBB\xBF");

Comments

0

This is happening because browsers conform to the requirements of the UTF-8 decode algorithm in the Encoding spec

  1. Let buffer be the result of peeking three bytes from ioQueue, converted to a byte sequence.
  2. If buffer is 0xEF 0xBB 0xBF, then read three bytes from ioQueue. (Do nothing with those bytes.)
  3. Process a queue with an instance of UTF-8’s decoder, ioQueue, output, and "replacement".
  4. Return output.

To workaround the problem you can add the BOM sequence twice at the start of the response, allowing a BOM sequence to remain. Or you can set your Content-Type header to octlet-stream causing the browser to skip decoding the response, however you will lose information that the file is text/csv.

In either case the csv will work in Excel.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.