1

I have a CSV file that I need to change the encoding of. I want to be able to do this using PHP. I know there is the mb_convert_encoding function but that is only for strings.

Is there a function I can use to change the encoding of an entire csv file?

Cheers,

Updates: Turns out the solution to my problem would be to remove the BOM from my file.

I am using @treehouse code below and modified it to replace bom but it just fills the temp file forever whats wrong?

$sourcePath = 'EstablishmentExport.csv';
$tempPath = $sourcePath . 'temp';
$source = fopen($sourcePath, 'r');
$target = fopen($tempPath, 'w');
while(!feof($source)) {
    $line = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $source);
    fwrite($target, $line);
}
fclose($source);
fclose($target);
unlink($sourcePath);
rename($tempPath, $sourcePath);
4
  • 2
    ummm, file is also a string ! ? Commented Jul 1, 2015 at 13:53
  • use file_get_contents to store the file as a string. Commented Jul 1, 2015 at 13:57
  • wont I lost the format of the csv like that? Commented Jul 1, 2015 at 13:57
  • Uhm, no? If you just read the contents of a file you get exactly the contents of the file. If you write it back you write exactly the original back. Commented Jul 1, 2015 at 14:35

4 Answers 4

1
file_put_contents('the/file/path.csv', mb_convert_encoding(file_get_contents('the/file/path.csv'), 'ENCODING'));

Just fill in the correct file path and the desired type of encoding.

Edit: Since the source file is apparently huge, you'll have to load the file line by line, which can be accomplished using fopen. However you need to write the newly encoded strings to a temporary file first, which you then rename to the original filename after deleting the original file:

$sourcePath = 'path/to/file.csv';
$tempPath = $sourcePath . 'temp';
$source = fopen($sourcePath, 'r');
$target = fopen($tempPath, 'w');
while(!feof($source)) {
    $line = mb_convert_encoding(fgets($source), 'ENCODING');
    fwrite($target, $line);
}
fclose($source);
fclose($target);
unlink($sourcePath);
rename($tempPath, $sourcePath);
Sign up to request clarification or add additional context in comments.

5 Comments

The file is 300MB would that be too big to load to memory? Also when I put it into a string how does it keep the line breaks?
you don't have to worry about line breaks, however the size is a valid concern. I'll come up with a solution.
I'd use tmpfile or php://temp or such for the temporary file... Also supply the from encoding for mb_convert_encoding... Apart from that, +1.
Hi, I have realised that instead of changing the encoding I can just remove the BOM to fix my file. I have modified the code but uinstead what happens is that it just fills the temp file forever. I have added the ammended code into the question.
That is the case because right before the regular expression, you have to write $line=fgets($source);, in order to move the pointer to the next line. Right now your loop always stays in the same line. Also preg_replace must have $line as a third argument, ,not $source.
0

Since you are dealing with a very big file I suggest leaving this task to the operating system by the means of exec, shell_exec or bactick operator.

See here about methods on how to do just that http://mindspill.net/computing/linux-notes/determine-and-change-file-character-encoding/ Best way to convert text files between character sets?

Example: shell_exec ( 'iconv -f utf-16le -t utf-8 1.csv > 2.csv' );

1 Comment

This worked :) Cheers man. I edited your answer to include my code so if anyone sees it they can get an idea.
0

Load the contents of the file into a string with file_get_contents(); then use mb_convert_encoding() on it and then store the converted string with file_put_contents().

Comments

0

Just read the entire file into a string with file_get_contents , then run it trough mb_convert_encoding function, and save again. That is all there is to it.

In case your file is huge, and it isn't practical to load it into memory at once, do it line by line. (look up fopen, fgets, etc)

1 Comment

File is encoded in UTF-16LE so fgets() is a no go as it will break the file. Do you have any suggestions?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.