2

I have a csv file with ~ 3 million records, that I want to migrate to sql server through my laptop (4GB ram).

pandas successfully reads the file to DataFrame (pd.read_csv()), but when I try to migrate (.to_sql()) I receive Memory Error:

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-12-94c146c2b7b7> in <module>()
----> 1 csv.to_sql(name='stats', con=engine, if_exists='append')

C:\Python27\lib\site-packages\pandas\core\generic.pyc in to_sql(self, name, con, flavor, schema, if_exists, index, index_label, chunksize, dtype)
    964             self, name, con, flavor=flavor, schema=schema, if_exists=if_exists,
    965             index=index, index_label=index_label, chunksize=chunksize,
--> 966             dtype=dtype)
    967 
    968     def to_pickle(self, path):

C:\Python27\lib\site-packages\pandas\io\sql.pyc in to_sql(frame, name, con, flavor, schema, if_exists, index, index_label, chunksize, dtype)
    536     pandas_sql.to_sql(frame, name, if_exists=if_exists, index=index,
    537                       index_label=index_label, schema=schema,
--> 538                       chunksize=chunksize, dtype=dtype)
    539 
    540 

C:\Python27\lib\site-packages\pandas\io\sql.pyc in to_sql(self, frame, name, if_exists, index, index_label, schema, chunksize, dtype)
   1170                          schema=schema, dtype=dtype)
   1171         table.create()
-> 1172         table.insert(chunksize)
   1173         # check for potentially case sensitivity issues (GH7815)
   1174         if name not in self.engine.table_names(schema=schema or self.meta.schema):

C:\Python27\lib\site-packages\pandas\io\sql.pyc in insert(self, chunksize)
    715 
    716                 chunk_iter = zip(*[arr[start_i:end_i] for arr in data_list])
--> 717                 self._execute_insert(conn, keys, chunk_iter)
    718 
    719     def _query_iterator(self, result, chunksize, columns, coerce_float=True,

C:\Python27\lib\site-packages\pandas\io\sql.pyc in _execute_insert(self, conn, keys, data_iter)
    689 
    690     def _execute_insert(self, conn, keys, data_iter):
--> 691         data = [dict((k, v) for k, v in zip(keys, row)) for row in data_iter]
    692         conn.execute(self.insert_statement(), data)
    693 

MemoryError:

Is there some other way that would let me successfully do migration?

2
  • 2
    You could chunk it so read 50k rows at a time, write to SQL and repeat, there is a chunksize param for read_csv Commented Jan 16, 2015 at 15:34
  • Actually there is a chunksize param for to_sql, by default this is None which means to write all the rows at once, could you try setting this to some value and see how you go Commented Jan 16, 2015 at 15:49

1 Answer 1

7

I think you have 2 approaches:

  1. Read the csv in chunks and then write to SQL DB and repeat
  2. OR you can write in chunks to the DB

So for read_csv there is a chunksize param.

Equally there is also chunksize param for to_sql

Sign up to request clarification or add additional context in comments.

2 Comments

Excellent, it works: for data in pd.read_csv(..., chunksize=1000): data.to_sql(...). I didn't have any luck with .to_sql(chunksize=...) just to mention.
@theta glad one of the approaches worked for you, I don't use SQL so don't know why that didn't work, it is supposed to

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.