5

I am trying to write a Pandas dataframe into MySQL database, and have used the following code:

engine = sqlalchemy.create_engine("mysql+pymysql://root:password@localhost/skills?charset=utf8mb4")
connection = engine.connect
dataframe.head().to_sql('indeed_resumes', engine, flavor='mysql', if_exists='replace',index=True)

However, I get the following error:

InternalError: (1366, "Incorrect string value: '\\xE1\\xBB\\x99i\\x0AO...' for column 'work' at row 5")

The datatype of the MySQL table is as follows:

 mysql> desc indeed_resumes;
    +-----------+------------+------+-----+---------+-------+
    | Field     | Type       | Null | Key | Default | Extra |
    +-----------+------------+------+-----+---------+-------+
    | index     | bigint(20) | YES  | MUL | NULL    |       |
    | certs     | text       | YES  |     | NULL    |       |
    | contact   | text       | YES  |     | NULL    |       |
    | education | text       | YES  |     | NULL    |       |
    | headline  | text       | YES  |     | NULL    |       |
    | info      | text       | YES  |     | NULL    |       |
    | skills    | text       | YES  |     | NULL    |       |
    | summary   | text       | YES  |     | NULL    |       |
    | updated   | text       | YES  |     | NULL    |       |
    | work      | text       | YES  |     | NULL    |       |
    +-----------+------------+------+-----+---------+-------+
    10 rows in set (0.00 sec)

My data consists of very long strings (sometimes around 3000 characters), so this could be causing the error. Any suggestions?

1 Answer 1

11

I seem to have resolved this issue. It looks like I also needed to change the database encoding using the following commands.

ALTER DATABASE skills CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
ALTER TABLE indeed_resumes CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

From https://mathiasbynens.be/notes/mysql-utf8mb4:

"Turns out MySQL’s utf8 charset only partially implements proper UTF-8 encoding. It can only store UTF-8-encoded symbols that consist of one to three bytes; encoded symbols that take up four bytes aren’t supported.

Luckily, MySQL 5.5.3 (released in early 2010) introduced a new encoding called utf8mb4 which maps to proper UTF-8 and thus fully supports Unicode, including astral symbols."

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.