It's 2022 and everything I search for on this is mostly older answers and something to do with decoding. I have been looking for a solution for a couple days now and am not sure what is the problem.
I am running Python 3.10 on Pycharm and also Python 3.10 in Flask Container. The Spanish characters appear correctly when viewed from the mysql CLI, but not when pulled and printed with python. I am trying to display spanish words on a web page.
FINAL UPDATE: Everything is working
When importing into MySQL, I took the suggestion to add the encoding when opening the csv file:
with open("spanish_words.csv", encoding="utf-8") as file:
This changed the way it displayed in the database:
mysql> SELECT * from spanish_beginner;
+----+-------------------------+--------------+
| id | spanish | english |
+----+-------------------------+--------------+
| 28 | a ▒ e ▒ i ▒ o ▒ u ▒ n ▒ | a e i o u n |
+----+-------------------------+--------------+
1 row in set (0.00 sec)
mysql>
This fixed the issue and the output is now displaying as intended.
END FINAL UPDATE
UPDATE: This is what I have found is happening, but don't know how to fix
The interpreter seems to be interpreting this:
test = b'\xc3\x83\xc2\xa1'
print(test)
print(test.decode('utf-8'))
as
test = b'\xc3\x83 \xc2\xa1'
print(test)
print(test.decode('utf-8'))
So instead of using all 4 codes, it interprets them as 2 separate codes. NOTE: I had to paste the 4 code in the console. If I run the program in flask or pycharm, it interprets the one code as two different characters
test = b'\xc3\x83\xc2\xa1'
print(test)
print(test.decode('utf-8'))
b'\xc3\x83\xc2\xa1'
á
test = b'\xc3\x83 \xc2\xa1'
print(test)
print(test.decode('utf-8'))
b'\xc3\x83 \xc2\xa1'
à ¡
END UPDATE
Here I have a python script using mysql-connector-python to read a CSV file (spanish_words.csv):
a á e é i í o ó u ú n ñ, a e i o u n
Simple one line, shouldn't be a problem. I will now insert this data into a DB.
import csv
from mysql.connector import connect, errorcode, Error
def db_connect():
try:
cnx = connect(
user='username',
password='password',
host='127.0.0.1',
database='flash',
)
except Error as err:
if err.errno == errorcode.ER_ACCESS_DENIED_ERROR:
print("Something is wrong with your username or password")
elif err.errno == errorcode.ER_BAD_DB_ERROR:
print("Database does not exist")
else:
print(err)
return cnx
cnx = db_connect()
with open("spanish_words.csv") as file:
data = csv.reader(file)
for row in data:
cursor = cnx.cursor()
query = (f"""
INSERT INTO spanish_beginner
(spanish, english)
VALUES
("{row[0]}", "{row[1]}")
""")
print(query)
cursor.execute(query)
cnx.commit()
Now, the output on the PyCharm Console:
D:\Dropbox\Technology\Python\PycharmProjects\day-32-start\venv\Scripts\python.exe D:/Dropbox/Technology/Python/PycharmProjects/day-32-start/test.py
INSERT INTO spanish_beginner
(spanish, english)
VALUES
("a á e é i à o ó u ú n ñ", " a e i o u n")
Process finished with exit code 0
Those are definitely not the values that I wanted to enter. But when I look at the Database:
mysql> use flash
Database changed
mysql> SELECT * from spanish_beginner;
+----+-------------------------------+--------------+
| id | spanish | english |
+----+-------------------------------+--------------+
| 26 | a á e é i í o ó u ú n ñ | a e i o u n |
+----+-------------------------------+--------------+
1 row in set (0.00 sec)
mysql>
Everything does look correct in the database.
Here is the table create:
mysql> show create table spanish_beginner;
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| spanish_beginner | CREATE TABLE `spanish_beginner` (
`id` int NOT NULL AUTO_INCREMENT,
`spanish` varchar(255) NOT NULL,
`english` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=28 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
Now, when I try to pull a value from the database:
# Import mysql-connector-python
from mysql.connector import connect, errorcode, Error
def db_connect():
try:
cnx = connect(
user='username',
password='password',
host='127.0.0.1',
database='flash',
)
except Error as err:
if err.errno == errorcode.ER_ACCESS_DENIED_ERROR:
print("Something is wrong with your username or password")
elif err.errno == errorcode.ER_BAD_DB_ERROR:
print("Database does not exist")
else:
print(err)
return cnx
cnx = db_connect()
cursor = cnx.cursor()
query = (f"""
SELECT id, spanish, english
FROM spanish_beginner
""")
print(query)
cursor.execute(query)
result = cursor.fetchall()
spanish = result[0][1]
print(spanish)
This returns garbage:
D:\Dropbox\Technology\Python\PycharmProjects\day-32-start\venv\Scripts\python.exe D:/Dropbox/Technology/Python/PycharmProjects/day-32-start/test2.py
SELECT id, spanish, english
FROM spanish_beginner
a á e é i à o ó u ú n ñ
Process finished with exit code 0
But if I run on console:
cursor.execute(query)
result = cursor.fetchall()
spanish = result[0][1]
spanish
'a á e é i �\xad o ó u ú n ñ'
print(spanish)
a á e é i í o ó u ú n ñ
Printing the variable here prints the value correctly, but when running the script, it prints the incorrect translation characters. I am not sure why the value is different when the print is run from the console vs running a script to do the same thing.
Using the same script and assigning the variable to a jinja template displays the same garbage.
Has anyone run across this problem or have any idea on how to fix?
EDIT: More Information
I tried adding this after the connect statement: cnx.set_charset_collation("utf8") and cnx.set_charset_collation("iso-8859-1") and cnx.set_charset_collation("latin1")
Those additions didn't change anything
I added a parameter to the connection, use_unicode=False
cnx = connect(
user='username',
password='password',
host='127.0.0.1',
database='flash',
use_unicode=False,
)
Now when I run this:
spanish = result[0][1]
print(spanish)
decoded = spanish.decode('utf-8')
print(decoded)
This is the output
bytearray(b'a \xc3\x83\xc2\xa1 e \xc3\x83\xc2\xa9 i \xc3\x83\xc2\xad o \xc3\x83\xc2\xb3 u \xc3\x83\xc2\xba n \xc3\x83\xc2\xb1')
a á e é i à o ó u ú n ñ
More interesting information:
If I run this script as a program in pycharm:
test = bytearray(b'a \xc3\x83\xc2\xa1 e \xc3\x83\xc2\xa9 i \xc3\x83\xc2\xad o \xc3\x83\xc2\xb3 u \xc3\x83\xc2\xba n \xc3\x83\xc2\xb1')
test_decode = test.decode('utf-8')
print(test_decode)
This is the result:
a á e é i à o ó u ú n ñ
If I run it in Pycharm Console, this is the result:
>>> test = bytearray(b'a \xc3\x83\xc2\xa1 e \xc3\x83\xc2\xa9 i \xc3\x83\xc2\xad o \xc3\x83\xc2\xb3 u \xc3\x83\xc2\xba n \xc3\x83\xc2\xb1')
>>> test_decode = test.decode('utf-8')
>>> print(test_decode)
a á e é i í o ó u ú n ñ