0

I seem to have some problem with encoding, but I can't pinpoint it.

PHPMyAdmin says:

Server type: MariaDB
Server version: 10.3.39-MariaDB-log - MariaDB Server
Server charset: ISO 8859-2 Central European (latin2)
Server connection collation: utf8mb4_unicode_ci

I can't change the SQL server in any way, this is provided by my website hosting provider.

All my dbs, tables and columns use utf8mb4_unicode_ci. All files are encoded as UTF-8. The values display properly both in PHPMyAdmin and in MySQLWorkbench. Other scripts on my website work fine, displaying english, russian, chinese, etc. Just this one is not complying for some reason. I tried inserting the data through PHPMA, Workbench, and even from the very same script.

I connect using PDO, with charset specified, via an included file:

<?php
if (!isset($pdo))
{
    $DBHOST = 'localhost';
    $DBNAME = '***';
    $DBUSER = '***';
    $DBPASS = '***';
    $DBCHRS = 'utf8mb4';

    $dsn = 'mysql:host='.$DBHOST.';dbname='.$DBNAME.';charset='.$DBCHRS;
    $options = [PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION, PDO::ATTR_DEFAULT_FETCH_MODE => PDO::FETCH_ASSOC, PDO::ATTR_EMULATE_PREPARES => false, PDO::MYSQL_ATTR_USE_BUFFERED_QUERY => false];
    try
    {
        $pdo = new PDO($dsn, $DBUSER, $DBPASS, $options);
    }
    catch (\PDOException $e)
    {
        echo 'Could not connect to the database!<br>Message: ', $e->getMessage(), '<br>Code: ', $e->getCode();
        exit();
    }
}
?>

and then

$json = [];
$json['people'] = [];
$json['relations'] = [];

$stmt = $pdo->prepare('SELECT * FROM `tree_people`;');
$stmt->execute();
while($row = $stmt->fetch(PDO::FETCH_ASSOC))
    $json['people'][] = $row;

$stmt = $pdo->prepare('SELECT * FROM `tree_relations`;');
$stmt->execute();
while($row = $stmt->fetch(PDO::FETCH_ASSOC))
    $json['relations'][] = $row;

/*/
header('Access-Control-Allow-Origin: *');
header('Content-Type: application/json; charset=utf-8');
/*/
header('Content-Type: text/html; charset=utf-8');
//*/

//*/
echo '<pre>';
print_r($json);
echo '</pre>';
//*/

echo '<pre>';
echo json_encode($json, JSON_UNESCAPED_UNICODE);
echo '</pre>';

JSON fails to generate (empty string?) and the content displayed with print_r has all the special characters showing as �.

I'm going crazy, what is going on?

Edit: I can properly read data from other tables. However it is only possible when I write the data to the DB through my own editor, not through PMA. See https://herhor.net/news/?id=1 when I first inserted it via PMA, it was full of unknown characters. However, now when I read it in PMA or Workbench, it is full of scrambled characters.

It seems that there is some mismatch between the encoding used by PMA/Workbench and the one used by all my scripts. Shouldn't the DB deal with it automatically?

Edit2: As requested, SHOW VARIABLES LIKE 'char%'; for Workbench:

character_set_client    utf8mb4
character_set_connection    utf8mb4
character_set_database  utf8mb4
character_set_filesystem    binary
character_set_results   utf8mb4
character_set_server    latin2
character_set_system    utf8
character_sets_dir  /usr/share/mysql/charsets/

For PMA:

character_set_client    utf8mb4
character_set_connection    utf8mb4
character_set_database  utf8mb4
character_set_filesystem    binary
character_set_results   utf8mb4
character_set_server    latin2
character_set_system    utf8
character_sets_dir  /usr/share/mysql/charsets/

For PHP PDO:

Array
(
    [0] => stdClass Object
        (
            [Variable_name] => character_set_client
            [Value] => latin2
        )

    [1] => stdClass Object
        (
            [Variable_name] => character_set_connection
            [Value] => latin2
        )

    [2] => stdClass Object
        (
            [Variable_name] => character_set_database
            [Value] => utf8mb4
        )

    [3] => stdClass Object
        (
            [Variable_name] => character_set_filesystem
            [Value] => binary
        )

    [4] => stdClass Object
        (
            [Variable_name] => character_set_results
            [Value] => latin2
        )

    [5] => stdClass Object
        (
            [Variable_name] => character_set_server
            [Value] => latin2
        )

    [6] => stdClass Object
        (
            [Variable_name] => character_set_system
            [Value] => utf8
        )

    [7] => stdClass Object
        (
            [Variable_name] => character_sets_dir
            [Value] => /usr/share/mysql/charsets/
        )

)

Also here is the SQL import/export file for both tables: https://pastebin.com/33Ap4Vje

14
  • 2
    It seems you did quite a thorough job. I don't get what you say: No JSON output from json_encode(), yet you can see output with print_r(). That doesn't make sense to me. Could you have a look at what json_last_error_msg() reports? Also, and this is not really that important, calling a variable $json when it clearly isn't JSON is weird, and variable names should be semantic, something like $trees. Commented Nov 18, 2023 at 20:45
  • 1
    Well, that's not what I expected, but it is useful information. Commented Nov 18, 2023 at 20:48
  • 1
    @user3783243 It's the server, not the database, and that's not a problem. Mine is using cp1252 West European (latin1). Note that the "charset" in the database connection here is UTF-8. Commented Nov 18, 2023 at 20:56
  • 1
    My problem is that I don't have your database, so I like to get rid of it by reducing the problem to a simple input string of bytes. The U+FFFD or � is your browser covering something up it cannot decode given the character encoding. Are you sure the PHP file you're running is UTF-8 encoded? Commented Nov 18, 2023 at 21:42
  • 2
    There are some ideas here that might help Commented Nov 18, 2023 at 22:37

2 Answers 2

1

As your edit shows, the problem is definitely with the PDO connection as it is showing latin2 in the various character_set_* variables.

Does adding SET NAMES utf8mb4 to your $options array help:

$options = [
    PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION,
    PDO::ATTR_DEFAULT_FETCH_MODE => PDO::FETCH_ASSOC,
    PDO::ATTR_EMULATE_PREPARES => false,
    PDO::MYSQL_ATTR_USE_BUFFERED_QUERY => false,
    PDO::MYSQL_ATTR_INIT_COMMAND => 'SET NAMES utf8mb4'
];
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks! The connection charset finally changed to utf8mb4 and the contents are now the same in Workbench, PMA and in the scripts. For some reason, the charset specified during connection was ignored...
I am glad this is working for you, but it still doesn't feel like an answer, as the code you posted should work. I cannot figure out why the charset in the dsn is not working 😞
@herhor67 I have not been able to replicate this issue. Please would you check the Client API version shown for pdo_mysql in phpinfo?
PHPInfo shows mysqlnd 8.1.25 under pdo_mysql
1

It would be much better if you provided the exact string you're having trouble with, but I'm guessing you're dealing with Non-UTF8 data. json_encode() in PHP will return empty string if it gets characters (bytes) outside UTF8 range I guess. Try to re-encode your string before passing it to json_encode().

You can use utf8_encode() on your string, or in case your string is encoded with Windows-1252/ANSI you can use the following approach:

$str = mb_convert_encoding($str, "UTF-8", "Windows-1252");

Take a look at this answer, which specifically discusses this problem on PDO:

while($row = $stmt->fetch(PDO::FETCH_ASSOC))
{
  foreach($row as &$value)
  {
    $value = mb_convert_encoding($value, "UTF-8", "Windows-1252");
  }
  unset($value);
  $json['relations'][] = $row;
}

Update: According to your update, seems that your data was scrambled in the first place (When you inserted that using PHPMYADMIN), but if it is working properly when using website editor (Which works with PDO right?), problem might be caused by charset collation set in your DB, One possible solution is to:

  • Export your DB as a .sql file
  • Read the file till you find charset collation and make sure it is utf8_general_ci and your data is ok(not messed up by encoding), save the file and add it as a new DB and use that one.
  • Make sure to use SET NAMES UTF8 query every time you interact with your DB, like in PDO you can use something like This.

This good answer might help you too.

8 Comments

My strings are names/surnames, which contain polish characters, like ł, ą, ś, ć etc
Converting from UTF-8 to UTF-8 just replaces U+FFFD with ?. Converting from Windows-1252 fixes like literally one of the characters, but the rest are still wrong.
var_dump( json_encode("łąść") ); seems to be working fine with the result of string(26) ""\u0142\u0105\u015b\u0107"" , Can you send a dump of base64-encoded binary of your string? Use sth like base64_encode(implode(' ',$json)); and add the result to your question, then we can test that string for encoding. @herhor67
Please see edit. It seems that there is some mismatch between the encoding used by PHPMA/Workbench and the one used by all my scripts.
Please see the update on answer @herhor67
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.