3

Well, ill try to explain it but please, apologize my english.

I have a script that dumps an entire database into an SQL file and then another script splits the lines and execute them to drop, create and insert the data. The problem is that some strings are "trimmed". It just insert the string until it reach the first special character, for example:

For the string:

"Pantalon azul marino de Poliéster con cinta blanca bordada con el nombre de la institución en uno de sus costados."

it just insert:

 "Pantalon azul marino de Poli"

No error is thrown. But this happens only using the script, but when i run the queries manually and importing the SQL file in phpMyAdmin everything works. Everything is set to utf8 by the way.

I'm out of ideas, any help will be very appreciated.

    include ('../core/connection.inc.php');
    $conn = dbConnect('admin');
    $conn->query("SET NAMES 'utf8'");
    $conn->set_charset("utf8");
    $type = 0;

    // Temporary variable, used to store current query
    $templine = '';
    // Read in entire file
    $lines = file('db-backup.sql');
    // Loop through each line
    $correct = 0;
    $failed = 0;
    foreach ($lines as $line){
    // Skip it if it's a comment
    if (substr($line, 0, 2) == '--' || $line == '')
        continue;
    // Add this line to the current segment
    $templine .= $line;
    // If it has a semicolon at the end, it's the end of the query
    if (substr(trim($line), -1, 1) == ';'){
        $templine = str_replace("latin1","utf8",$templine);
        $templine = trim($templine);

        // Perform the query
        $conn->query($templine);
        $errno = $conn->errno;
        $error = $conn->error;
        if($conn->affected_rows > 0){
            echo "OK: (".$templine.")<br/>";
            $correct++;
        } else {
            echo "Failed: (".$templine.")<br/>";
            echo "&nbsp;&nbsp; Errno: ".$errno." <br/>";
            echo "&nbsp;&nbsp; Error: ".$error." <br/>";
            $failed++;
        }
        $templine = '';
    }
    }
8
  • 2
    check field size of your table. i thing you should increase the field size to be varchar(256) or more Commented Feb 6, 2014 at 5:02
  • @SatishSharma the field is a Text type :/ Commented Feb 6, 2014 at 5:04
  • Obviously it has to with the transliteration contained within the string being truncated. Can you insert the string, manually from the mysql command line, without it truncating? Commented Feb 6, 2014 at 5:33
  • truncated, that was the word. Well, there is no easy way to say this but... what is the mysql command line? x). But, if this help you, it works when copy and paste the content of the SQL file in the SQL input in phpMyAdmin and also it works when i directly import this file. And when it outputs the $templine variable when the insert query is done, it isn't truncated. Commented Feb 6, 2014 at 5:44
  • 1
    find the right char set to use on your script (php) or encoding. i don't know which but its obvious from where the truncated part is. it does not accept character like this é which is on the word Poliéster. or how about removing your utf8. forgot the details but you might have encoding those special characters twice in utf8. Commented Feb 6, 2014 at 5:47

2 Answers 2

2

I'm guessing that the dump file you're importing isn't UTF-8.

PHP is piping the bytes from the file to MySQL without any conversion. The é character in your file is probably in latin1 based on the change you're making, likely represented by a single byte with a value > 127. This isn't UTF-8. You've promised MySQL that you'll send valid UTF-8, and it stops reading the string when it gets to an invalid byte.

You might consider:

  • re-encoding the dump file as UTF-8
  • figuring out what encoding the dump file is in, and loading it into MySQL using that encoding

Personally I think I'd approach the problem a different way:

  1. Load the dump file into MySQL using the command-line client, or something similar. You know this works.
  2. Alter the character set of each column after importing - you can use the data in information_schema to assemble ALTER TABLE statements and get MySQL to do the conversion properly.
Sign up to request clarification or add additional context in comments.

2 Comments

Quite definitely you have an invalid character for UTF-8 -- MySQL reads it and chokes, and so stops the transfer there. It is possible to IMPORT a non-UTF-8 .sql file, but you have to tell phpMyAdmin (or whatever) the exact encoding of that file, so it can translate it to UTF-8 on the fly. And woe to you if you have a mixture of encodings within the file.
if the dump file isn't encoded as UTF-8 shouldn't it fails also when i import the file in phpMyAdmin as an UTF-8 encoded file? Anyway, i don't know exactly why but removing the "Set Names" and "Set Charset" lines worked. So, for now, i'll keep it that way, but anyway I will check the "dumping" script, maybe there is the problem with the encoding and not in the "importing" script.
0

I don't know how your table looks like, but i can give you this bit of extra advice just in case. Make sure you have your DB cols set to UTF8:

ALTER TABLE {table_name} CHANGE COLUMN {col_name} {col_name} TEXT CHARACTER SET 'utf8' COLLATE 'utf8_general_ci' NOT NULL ;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.