9

I'm using php-excel-reader 2.21 for converting XLS file to CSV. I wrote a simple script to do that, but I have some problems with unicode characters. It does not return values from some cells.

For example it doesn't have problems with cell content ceník položek but have problems with nákup, VÝROBCE, PÁS, HRUBÝ,NÁKLADNÍ and some others. In these cells it returns empty value ("").

Here is the code snippet I use for conversion:

<?php    
set_time_limit(120);    
require_once 'excel_reader2.php';    
$data = new Spreadsheet_Excel_Reader("cenik.xls", false, 'UTF-8');    

$f = fopen('file.csv', 'w');    
for($row = 1; $row <= $data->rowcount(); $row++)    
{    
    $out = '';    
    for($col = 1; $col <= $data->colcount(); $col++)    
    {    
        $val = $data->val($row,$col);

        // escape " and \ characters inside the cell    
        $escaped = preg_replace(array('#”#u', '#\\\\#u', '#[”"]#u'), array('"', '\\\\\\\\', '\"'), $val);    
        if(empty($val))    
            $out .= ',';    
        else    
            $out .= '"' . $escaped . '",';    
    }
    // remove last comma (,)    
    fwrite($f, substr($out, 0, -1));    
    fwrite($f, "\n");
}
fclose($f);

?>

Note that the cell and row indexes starts from 1. Any suggestions?

1
  • Can anybody tell that Which answer is more applicable or more proper way? Mr. cypher's Or @thuclh which answer is ideal? Commented Jun 1, 2016 at 5:45

2 Answers 2

32

I hope it's the same problem as I had: In excel_reader2.php on line 1120, replace

$retstr = ($asciiEncoding) ? $retstr : $this->_encodeUTF16($retstr);

with

$retstr = ($asciiEncoding) ? iconv('cp1250', 'utf-8', $retstr) : $this->_encodeUTF16($retstr);

That should fix it, however I suggest you use a different excel reader, such as PHPExcel to avoid problems like these.
Note that you need iconv extension enabled on the server.

Sign up to request clarification or add additional context in comments.

3 Comments

(This is a different question and should be posted separately.)
Worked. But had to replace 'cp1250' for 'iso-8859-1''.
Which value should I replace 'cp1250' if I want to read Chinese character? I had tried 'cp950', 'iso-8859-1' as @AndréMorales suggestion but still no luck.
10

I has the answer for this problem, use php_excel_reader like common! Add a function to Spreadsheet_Excel_Reader class:

function seems_utf8($str) {
        for ($i=0; $i<strlen($str); $i++) {
            if (ord($str[$i]) < 0x80) continue; # 0bbbbbbb
            elseif ((ord($str[$i]) & 0xE0) == 0xC0) $n=1; # 110bbbbb
            elseif ((ord($str[$i]) & 0xF0) == 0xE0) $n=2; # 1110bbbb
            elseif ((ord($str[$i]) & 0xF8) == 0xF0) $n=3; # 11110bbb
            elseif ((ord($str[$i]) & 0xFC) == 0xF8) $n=4; # 111110bb
            elseif ((ord($str[$i]) & 0xFE) == 0xFC) $n=5; # 1111110b
            else return false; # Does not match any model
            for ($j=0; $j<$n; $j++) { # n bytes matching 10bbbbbb follow ?
                if ((++$i == strlen($str)) || ((ord($str[$i]) & 0xC0) != 0x80))
                    return false;
            }
        }
        return true;
    }

And add below line 1120: $retstr = $this->seems_utf8($retstr)?$retstr:utf8_encode($retstr);

Finish!

You can use file php_excel_reader, that i modify! Download here : File excel_reader2.php Use like common with Original-excel-reader

3 Comments

You are a champ! Thanks for this. The solution provided above is buggy. It converts è to č, I guess the incoming encoding may not be 'cp1250'. Mine is already utf-8
utf8_encode() accepts string in ISO-8859-1 and converts it to utf8. But how can you guarantee that $retstr will always be either in ISO-8859-1 or UTF8?
I had tried but it does not work with Chinese character :(

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.