-1

I have multiple thousands of csvs and each csv has over 10000 records. I'm looking for the most efficient way to dump this data into tables in Postgres DB with minimal time and effort.

10
  • 4
    What methods have you found on your searches? How did they work out? Commented Aug 23, 2018 at 14:23
  • Similar questions have been answered already. Please look at - stackoverflow.com/questions/30050097/… and also look at stackoverflow.com/questions/12646305/… Commented Aug 23, 2018 at 14:32
  • It s not the same as the previously opened questions because: The old questions give you a way to import a SINGLE csv file into postgres. But in my case, I want to automate the import of a very large number of files where there are 2 manual processes involved: 1. Create a new table 2. Import the csv into this new table. I want to accomplish these 2 steps in one procedure for multiple thousands of files through automation. Commented Aug 23, 2018 at 15:12
  • In other words, I want to be able to create the tables on the fly and assign table names as source file names and then import data from source file into the table created for a bulk of files. Commented Aug 23, 2018 at 15:16
  • @sulabhchaturvedi the links you have provided have answers to tackle a single file import into a new table with manual table creation. But my question is different. Commented Aug 23, 2018 at 15:17

2 Answers 2

1

COPY is usually the best solution. Depends of your constraints.

COPY table_name FROM 'path_readable_by_postgres/file.cvs';

And you can cat your files in a big one to import data quickly.

Look ta https://www.postgresql.org/docs/current/static/sql-copy.html for more details.

Sign up to request clarification or add additional context in comments.

3 Comments

as @zeevb pointed out this is same answer as provided by a different user in the link
I have multiple thousands of files. I don't want to manually add all the file names. ALso, for this command to work, I already need to have the table created, which I don't have. If I were to manually create, I'd have to create multiple thousands of table and it would be extremely time consuming.
1

You can use pandas library to read and transform data (if needed), sqlalchemy to create postgres engine and psycopg2 to load data into postgresql. I assume, that you've already created tables in Postgres DB. Try something like the code below

import pandas as pd
from sqlalchemy import create_engine
import pandas as pd
import psycopg2
# Drop "Unnamed: 0", as it often causes problems in writing to table
pd.read_csv({path/to/file.csv}, index_col={index_column}).drop(["Unnamed: 0"], axis=1)
# Now simply load your data into database
engine = create_engine('postgresql://user:password@host:port/database')
try:
    pd_table.to_sql({'name_of_table_in_postgres_db'}, engine, if_exists='append')
except (Exception, psycopg2.DatabaseError) as error:
    print(error)
finally:
    print('Closed connection to the database')

5 Comments

So, no I haven't yet created the tables. The code above reads one csv and dumps it to a new table and I can loop it out for all the files?
The above code throws an error in line engine = try:
engine = try: why are you using assignment here?
engine = try: --- sorry, I have lost some code, now it should be okay
The code above adds one csv into postgresql table, created earlier. You can loop it to add all csv files into table.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.