1

It's 2022 and everything I search for on this is mostly older answers and something to do with decoding. I have been looking for a solution for a couple days now and am not sure what is the problem.

I am running Python 3.10 on Pycharm and also Python 3.10 in Flask Container. The Spanish characters appear correctly when viewed from the mysql CLI, but not when pulled and printed with python. I am trying to display spanish words on a web page.

FINAL UPDATE: Everything is working

When importing into MySQL, I took the suggestion to add the encoding when opening the csv file:

    with open("spanish_words.csv",  encoding="utf-8") as file:

This changed the way it displayed in the database:


mysql> SELECT * from spanish_beginner;
+----+-------------------------+--------------+
| id | spanish                 | english      |
+----+-------------------------+--------------+
| 28 | a ▒ e ▒ i ▒ o ▒ u ▒ n ▒ |  a e i o u n |
+----+-------------------------+--------------+
1 row in set (0.00 sec)

mysql> 

This fixed the issue and the output is now displaying as intended.

END FINAL UPDATE

UPDATE: This is what I have found is happening, but don't know how to fix

The interpreter seems to be interpreting this:

test = b'\xc3\x83\xc2\xa1'
print(test)
print(test.decode('utf-8'))

as

test = b'\xc3\x83 \xc2\xa1'
print(test)
print(test.decode('utf-8'))

So instead of using all 4 codes, it interprets them as 2 separate codes. NOTE: I had to paste the 4 code in the console. If I run the program in flask or pycharm, it interprets the one code as two different characters

test = b'\xc3\x83\xc2\xa1'
print(test)
print(test.decode('utf-8'))

b'\xc3\x83\xc2\xa1'
á

test = b'\xc3\x83 \xc2\xa1'
print(test)
print(test.decode('utf-8'))

b'\xc3\x83 \xc2\xa1'
à ¡

END UPDATE

Here I have a python script using mysql-connector-python to read a CSV file (spanish_words.csv):

a á e é i í o ó u ú n ñ, a e i o u n

Simple one line, shouldn't be a problem. I will now insert this data into a DB.

import csv
from mysql.connector import connect, errorcode, Error


def db_connect():
    try:
    cnx = connect( 
            user='username',  
            password='password',  
            host='127.0.0.1',  
            database='flash', 
    )

    except Error as err:
        if err.errno == errorcode.ER_ACCESS_DENIED_ERROR:
            print("Something is wrong with your username or password")
        elif err.errno == errorcode.ER_BAD_DB_ERROR:
            print("Database does not exist")
        else:
            print(err)
    return cnx

cnx = db_connect()

with open("spanish_words.csv") as file:
    data = csv.reader(file)
    for row in data:
        cursor = cnx.cursor()
        query = (f"""
                  INSERT INTO spanish_beginner
                  (spanish, english)
                  VALUES
                  ("{row[0]}", "{row[1]}")
                 """)
        print(query)
        cursor.execute(query)
        cnx.commit()

Now, the output on the PyCharm Console:

D:\Dropbox\Technology\Python\PycharmProjects\day-32-start\venv\Scripts\python.exe D:/Dropbox/Technology/Python/PycharmProjects/day-32-start/test.py

                  INSERT INTO spanish_beginner
                  (spanish, english)
                  VALUES
                  ("a á e é i í o ó u ú n ñ", " a e i o u n")
                 

Process finished with exit code 0

Those are definitely not the values that I wanted to enter. But when I look at the Database:

mysql> use flash
Database changed
mysql> SELECT * from spanish_beginner;
+----+-------------------------------+--------------+
| id | spanish                       | english      |
+----+-------------------------------+--------------+
| 26 | a á e é i í o ó u ú n ñ |  a e i o u n |
+----+-------------------------------+--------------+
1 row in set (0.00 sec)

mysql>

Everything does look correct in the database.

Here is the table create:

mysql> show create table spanish_beginner;
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table            | Create Table                                                                                                                                                                                                                                          |
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| spanish_beginner | CREATE TABLE `spanish_beginner` (
  `id` int NOT NULL AUTO_INCREMENT,
  `spanish` varchar(255) NOT NULL,
  `english` varchar(255) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=28 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Now, when I try to pull a value from the database:

# Import mysql-connector-python
from mysql.connector import connect, errorcode, Error


def db_connect():
    try:
        cnx = connect(
                user='username', 
                password='password', 
                host='127.0.0.1', 
                database='flash',
        )
    except Error as err:
        if err.errno == errorcode.ER_ACCESS_DENIED_ERROR:
            print("Something is wrong with your username or password")
        elif err.errno == errorcode.ER_BAD_DB_ERROR:
            print("Database does not exist")
        else:
            print(err)
    return cnx
    
cnx = db_connect()


cursor = cnx.cursor()

query = (f"""
         SELECT id, spanish, english
         FROM spanish_beginner
""")
print(query)

cursor.execute(query)
result = cursor.fetchall()

spanish = result[0][1] 
print(spanish)

This returns garbage:

D:\Dropbox\Technology\Python\PycharmProjects\day-32-start\venv\Scripts\python.exe D:/Dropbox/Technology/Python/PycharmProjects/day-32-start/test2.py

         SELECT id, spanish, english
         FROM spanish_beginner

a á e é i í o ó u ú n ñ

Process finished with exit code 0

But if I run on console:

cursor.execute(query)
result = cursor.fetchall()
spanish = result[0][1] 
spanish 
'a á e é i �\xad o ó u ú n ñ'
print(spanish) 
a á e é i í o ó u ú n ñ

Printing the variable here prints the value correctly, but when running the script, it prints the incorrect translation characters. I am not sure why the value is different when the print is run from the console vs running a script to do the same thing.

Using the same script and assigning the variable to a jinja template displays the same garbage.

Has anyone run across this problem or have any idea on how to fix?

EDIT: More Information

I tried adding this after the connect statement: cnx.set_charset_collation("utf8") and cnx.set_charset_collation("iso-8859-1") and cnx.set_charset_collation("latin1")

Those additions didn't change anything

I added a parameter to the connection, use_unicode=False

cnx = connect( 
        user='username',  
        password='password',  
        host='127.0.0.1',  
        database='flash', 
        use_unicode=False,
)

Now when I run this:

spanish = result[0][1]
print(spanish)
decoded = spanish.decode('utf-8')
print(decoded)

This is the output

bytearray(b'a \xc3\x83\xc2\xa1 e \xc3\x83\xc2\xa9 i \xc3\x83\xc2\xad o \xc3\x83\xc2\xb3 u \xc3\x83\xc2\xba n \xc3\x83\xc2\xb1')
a á e é i í o ó u ú n ñ

More interesting information:

If I run this script as a program in pycharm:

test = bytearray(b'a \xc3\x83\xc2\xa1 e \xc3\x83\xc2\xa9 i \xc3\x83\xc2\xad o \xc3\x83\xc2\xb3 u \xc3\x83\xc2\xba n \xc3\x83\xc2\xb1')
test_decode = test.decode('utf-8')
print(test_decode)

This is the result:

a á e é i í o ó u ú n ñ

If I run it in Pycharm Console, this is the result:

>>> test = bytearray(b'a \xc3\x83\xc2\xa1 e \xc3\x83\xc2\xa9 i \xc3\x83\xc2\xad o \xc3\x83\xc2\xb3 u \xc3\x83\xc2\xba n \xc3\x83\xc2\xb1')
>>> test_decode = test.decode('utf-8')
>>> print(test_decode)
a á e é i í o ó u ú n ñ

1 Answer 1

1

Try adding this argument when you go read the csv file:

with open("spanish_words.csv", encoding="utf-8") as file:

So the code accessing the csv file should look like this:

with open("spanish_words.csv", encoding="utf-8") as file:
    data = csv.reader(file)
    for row in data:
        cursor = cnx.cursor()
        query = (f"""
                  INSERT INTO spanish_beginner
                  (spanish, english)
                  VALUES
                  ("{row[0]}", "{row[1]}")
                 """)
        print(query)
        cursor.execute(query)
        cnx.commit()

This should solve the problem as the values get inserted into the database correctly.

Sign up to request clarification or add additional context in comments.

5 Comments

Thank you for looking into this! I tried adding that and I got an AttributeError ``` File "D:\Dropbox\Technology\Python\PycharmProjects\day-32-start\test2.py", line 12, in db_connect cnx.setencoding(encoding="utf8") AttributeError: 'CMySQLConnection' object has no attribute 'setencoding' ```
I also tried cnx.set_charset_collation("utf8") (and "iso-8859-1") which didn't change the output: a á e é i í o ó u ú n ñ
I updated my answer with a different solution, I added the character set to a CSV and read the file in Python and was able to replicate the bad characters you were seeing. I changed the encoding to "utf-8" and was able to read the correct characters.
Thank you again Richard. I tried using that to import into the MySQL DB and it displayed differently in the DB. That fixed the issue since the data was put into the DB the way it should have been.
No problem man, glad I could help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.