2

I am trying to write a CSV file with a character encoding set to UTF-16BE from a MySQL database encoded in UTF-8.

My code is:

$f = fopen('file.csv', 'w');
$firstLineKeys = false;

// UTF-16BE BOM
fwrite($f, chr(254) . chr(255));

foreach ($lines as $line)
{
    $lineEncoded = [];

    foreach ($line as $key => $value) 
    {
        $key = mb_convert_encoding($key, 'UTF-16BE', "auto");
        $value = mb_convert_encoding($value, 'UTF-16BE', "auto");
        $lineEncoded[$key] = $value;
    }

    if (empty($firstLineKeys))
    {
        $firstLineKeys = array_keys($lineEncoded);

        fputcsv($f, $firstLineKeys);

        $firstLineKeys = array_flip($firstLineKeys);
    }

    fputcsv($f, array_merge($firstLineKeys, $lineEncoded));
}

fclose($f);

When I open the file in OpenOffice it try's to import it with a character set of Unicode but the fields are a mess... when I switch the import character set to UTF-8 it looks correct.

Any help would be apprecated thanks

1 Answer 1

1
$key = mb_convert_encoding($key, 'UTF-16BE', "auto");

(Are you sure you want BE? It's a pretty rarely-used encoding. Windows “Unicode” is UTF-16LE.)

I would avoid using "auto" as the from_encoding. It's an unreliable bodge that will often produce the wrong results especially on short strings. As the input is apparently UTF-8 you should state that explicitly instead.

fputcsv($f, array_merge($firstLineKeys, $lineEncoded));

Unfortunately fputcsv can't write to a UTF-16-encoded file. It uses single-byte ASCII commas/quotes/newlines so it only works for encodings that are ASCII supersets. So if you wanted to use it you would have to write the whole file as UTF-8, and then transcode the whole file to UTF-16.

You might want to consider a different (or your own) CSV writer instead; as well as being annoying to use for non-ASCII encodings, fputcsv also doesn't comply with the RFC standard for CSV files, so you can easily generate files most CSV-consuming software can't load properly.

PHP's in-built CSV functions are essentially a complete waste of everyone's time.

Sign up to request clarification or add additional context in comments.

2 Comments

How does it not conform to the standards?
As long as you use fputcsv($handle, $array, ',', '"', "\0") it should escape everything properly (after reading the linked bug). fgetcsv seems to be a different story though

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.