0

I'm using python to export a large matrixs (shape around 3000 * 3000) into MySQL.

Right now I'm using MySQLdb to insert those values but it's too troublesome and too inefficient. Here is my code:

# -*- coding:utf-8 -*-


import MySQLdb
import numpy as np
import pandas as pd
import time


def feature_to_sql_format(df):
    df = df.fillna(value='')
    columns = list(df.columns)
    index = list(df.index)
    index_sort = np.reshape([[int(i)] * len(columns) for i in index], (-1)).tolist()
    columns_sort = (columns * len(index))
    values_sort = df.values.reshape(-1).tolist()
    return str(zip(index_sort, columns_sort, values_sort))[1: -1].replace("'NULL'", 'NULL')


if __name__ == '__main__':
    t1 = time.clock()
    df = pd.read_csv('C:\\test.csv', header=0, index_col=0)
    output_string = feature_to_sql_format(df)
    sql_CreateTable = 'USE derivative_pool;DROP TABLE IF exists test1;' \
                      'CREATE TABLE test1(date INT NOT NULL, code VARCHAR(12) NOT NULL, value FLOAT NULL);'
    sql_Insert = 'INSERT INTO test (date,code,value) VALUES ' + output_string + ';'
    con = MySQLdb.connect(......)
    cur = con.cursor()
    cur.execute(sql_CreateTable)
    cur.close()
    cur = con.cursor()
    cur.execute(sql_Insert)
    cur.close()
    con.commit()
    con.close()
    t2 = time.clock()
    print t2 - t1

And it consumes around 274 seconds totally.

I was wondering if there is a simplier way to do this, I thought of export the matrix to csv and then use LOAD DATA INFILE to import, but it's also too complicated.

I noticed that in pandas documentation pandas dataframe has a function to_sql, and in version 0.14 you can set the 'flavor' to 'mysql', that is:

df.to_sql(con=con, name=name, flavor='mysql')

But now my pandas version is 0.19.2 and the flavor is reduced to only 'sqlite'...... And I still tried to use

df.to_sql(con=con, name=name, flavor='sqlite')

and it gives me an error.

Is there any convinient way to do this?

1

1 Answer 1

1

Later pandas versions support SQLalchemy connectors instead of flavor = "mysql"

First, install dependencies:

pip install mysql-connector-python-rf==2.2.2
pip install MySQL-python==1.2.5
pip install SQLAlchemy==1.1.1

Then create the engine:

from sqlalchemy import create_engine
connection_string= "mysql+mysqlconnector://root:@localhost/MyDatabase"
engine = create_engine(connection_string)

Then you can use df.to_sql(...):

df.to_sql('MyTable', engine)

Here are some things you can do in MYSQL to speed up your data load:

SET FOREIGN_KEY_CHECKS = 0;
SET UNIQUE_CHECKS = 0;
SET SESSION tx_isolation='READ-UNCOMMITTED';
SET sql_log_bin = 0;
#LOAD DATA LOCAL INFILE....
SET UNIQUE_CHECKS = 1;
SET FOREIGN_KEY_CHECKS = 1;
SET SESSION tx_isolation='READ-REPEATABLE';
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks it works, but the problem is I tried it and it takes even longer! Is there a way to shorten this process? my csv is about 3000 * 3000 and it only takes 100M as a csv format file. I don't understand why it takes so long...
@DirkPaul your CSV has 3000 row and 3000 columns?
Yes, but it was too big for SQL so I have to turn it into one dimensional with (row_number, column_number, value)
That should not take too long... there are some things you can do to speed up mysql loading though. See edit.
Thank you so much~ Let me try that

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.