parsing huge csv file into mysql [python]

Question

I have some problems with parsing huge csv file into mysql databse.

Csv file looks like this:

ref1  data1  data2  data3...
ref1  data4  data5  data6...
ref2  data1  data2  data3 data4 data5..
ref2  data12 data13 data14
ref2  data21 data22...
.
.
.

Csv file has about 1 milion lines or about 7MB in zip file or about 150MB unzip.

My job is to parse the data from csv into mysql, but only the data/lines when references matches. Another problem is, that from multiple lines in csv i must parse it in only one line in mysql for one reference.

I tryed to do this with csv.reader and for loops on each references, but is ultra slow.

with con:
cur.execute("SELECT ref FROM users")
user=cur.fetchall()
for i in range(len(user)):
    with open('hugecsv.csv', mode='rb') as f:
        reader = csv.reader(f, delimiter=';')                               
        for row in reader:
            if(str(user[i][0])==row[0]):
                writer.writerow(row)

So i have all references which i would like to parse, in my list user. Which is the fastes way to parse?

Please help!

Please clarify "from multiple lines in csv i must parse it in only one line". — Janne Karila
– Janne Karila, Commented Dec 13, 2013 at 7:40

bruno desthuilliers · Accepted Answer · 2013-12-13 08:30:53Z

2

The first obvious bottleneck is that you are reopening and scanning the whole CSV file for each user in your database. Doing a single pass on the csv would be faster :

# faster lookup on users
cur.execute ("select ref from users")
users = set(row[0] for row in cur.fetchall())

with open("your/file.CSV") as f:
    r = reader(f)
    for row in r:
        if row[0] in users:
            do_something_with(row)

edited Dec 13, 2013 at 8:30

answered Dec 13, 2013 at 7:44

bruno desthuilliers

78.3k6 gold badges102 silver badges129 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

djpiky Over a year ago

sorry for my ignorance, what exactly set() do? Becuse python does not return any errors but variable users does't exsist when i run code

bruno desthuilliers Over a year ago

set is a builtin type, it's a collection of unique elements with fast (0(1)) lookup. But there was an error in my code (sorry answered from my phone), which I just fixed.

BaBL86 · Accepted Answer · 2013-12-13 07:23:50Z

1

Use:

LOAD DATA INFILE 'EF_PerechenSkollekciyami.csv' TO `TABLE_NAME` FIELDS TERMINATED BY ';'

This is an internal query command in mysql.

I don't recommend you to use tabs to separate columns, and recommend you to change this by sed to ; or something another character. But you can try with tabs too.

answered Dec 13, 2013 at 7:23

BaBL86

2,6301 gold badge16 silver badges13 bronze badges

2 Comments

Paul Draper Over a year ago

Why don't you recommend tab separated columns? MySQL uses this my default. And why terminate with ;?

djpiky Over a year ago

I get csv file every month from multiple companyes and i would like to use python to parse, becuse i need control on parsing (time stamp that program is running automaticly, log files with errors, another py programs to control resources...)

Paul Draper · Accepted Answer · 2013-12-13 07:24:47Z

0

You haven't included all your logic. If you just want to import everything into a single table,

cur.execute("LOAD DATA INFILE 'path_to_file.csv' INTO TABLE my_table;")

MySQL does it directly. You can't get any faster than that.

Documentation

answered Dec 13, 2013 at 7:24

Paul Draper

84.2k53 gold badges216 silver badges303 bronze badges

4 Comments

djpiky Over a year ago

basicly i must filter my csv file and write into mysql just lines on which reference matches.

Paul Draper Over a year ago

What about importing the CSV, and then running a SQL query to do the filtering?

djpiky Over a year ago

This is option but i don't know how to do this becuse i have very dinamicly csv file. For example i don't know how many lines i have for one user/reference.

Michael Kruglos Over a year ago

@djpiky you can load your file into temporary table with no indexes and then extract only the relevant records into the actual table.

Collectives™ on Stack Overflow

parsing huge csv file into mysql [python]

3 Answers 3

2 Comments

2 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

2 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest