Efficient way to import a lot of csv files into PostgreSQL db

Question

I see plenty of examples of importing a CSV into a PostgreSQL db, but what I need is an efficient way to import 500,000 CSV's into a single PostgreSQL db. Each CSV is a bit over 500KB (so grand total of approx 272GB of data).

The CSV's are identically formatted and there are no duplicate records (the data was generated programatically from a raw data source). I have been searching and will continue to search online for options, but I would appreciate any direction on getting this done in the most efficient manner possible. I do have some experience with Python, but will dig into any other solution that seems appropriate.

Thanks!

The fastest way to import data into PostgreSQL is to use the COPY command. — user330315
– user330315, Commented Sep 28, 2012 at 20:10

Gareth Rees · Accepted Answer · 2012-09-29 10:23:07Z

11

If you start by reading the PostgreSQL guide "Populating a Database" you'll see several pieces of advice:

Load the data in a single transaction.
Use COPY if at all possible.
Remove indexes, foreign key constraints etc before loading the data and restore them afterwards.

PostgreSQL's COPY statement already supports the CSV format:

COPY table (column1, column2, ...) FROM '/path/to/data.csv' WITH (FORMAT CSV)

so it looks as if you are best off not using Python at all, or using Python only to generate the required sequence of COPY statements.

answered Sep 29, 2012 at 10:23

Gareth Rees

66.1k10 gold badges139 silver badges166 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

FredG Over a year ago

Gareth - I assume from your response that you cannot use wildcards in the Copy statement? From what I can tell, I need some type of programming method to create a list of the files and then issue individual Copy commands. Please correct me if I am wrong. I hope I am :). It would be much easier if I could just issue a Copy statement with a wildcard *.csv

Gareth Rees Over a year ago

Surely generating the sequence of COPY commands is easy? You could do it shell, for example:

(for FILE in /path/to/*.csv; do echo "COPY table (column1, column2, ...) FROM '$FILE' WITH (FORMAT CSV);"; done) > import-commands.sql

FredG Over a year ago

Isn't that a bit inefficient? Generating a file with 500,000 Copy commands?

Gareth Rees Over a year ago

In the context of loading 272 GB of data, a few hundred thousand SQL commands are small change.

road_to_quantdom Over a year ago

for others coming across this, you could also use python or R or whatever your preference is to group some of those CSV files into bigger batches ( as much as your RAM can handle ) and then use the COPY that Postgres provides

Michael · Accepted Answer · 2012-09-28 20:24:35Z

0

Nice chunk of data you have there. I'm not 100% sure about Postgre, but at least MySQL provides some SQL commands, to feed a csv directly into a table. This bypasses any insert checks and so on and is thatswhy more than a order of magnitude faster than any ordinary insert operations.

So the probably fastest way to go is create some simple python script, telling your postgre server, which csv files in which order to hungrily devour into it's endless tables.

answered Sep 28, 2012 at 20:24

Michael

7,8061 gold badge41 silver badges64 bronze badges

Comments

abfurlan · Accepted Answer · 2013-10-18 13:19:21Z

I use php and postgres, and read the csv file with php and ride a string in the following format:

{ {line1 column1, line1 column2, line1 column3} , { line2 column1,line2 column2,line2 column3} }

Care in a single transaction by passing the string parameter to postgresql function.

I can check all records, formatting, amount of data, etc., and get a result of importing 500,000 records in about 3 minutes.

To read the data in postgresql function:

 DECLARE
     d varchar[];
 BEGIN

     FOREACH d SLICE 1 IN ARRAY p_dados
     LOOP


     INSERT INTO schema.table (
       column1,
       column2,
       column3,
     )
     VALUES ( 
       d[1],
       d[2]::INTEGER, -- explicit conversion to INTEGER
       d[3]::BIGINT, -- explicit conversion to BIGINT
     );        
     END LOOP;

  END;

Collectives™ on Stack Overflow

Efficient way to import a lot of csv files into PostgreSQL db

3 Answers 3

5 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related