I have multiple thousands of csvs and each csv has over 10000 records. I'm looking for the most efficient way to dump this data into tables in Postgres DB with minimal time and effort.
-
4What methods have you found on your searches? How did they work out?roganjosh– roganjosh2018-08-23 14:23:47 +00:00Commented Aug 23, 2018 at 14:23
-
Similar questions have been answered already. Please look at - stackoverflow.com/questions/30050097/… and also look at stackoverflow.com/questions/12646305/…sulabh chaturvedi– sulabh chaturvedi2018-08-23 14:32:34 +00:00Commented Aug 23, 2018 at 14:32
-
It s not the same as the previously opened questions because: The old questions give you a way to import a SINGLE csv file into postgres. But in my case, I want to automate the import of a very large number of files where there are 2 manual processes involved: 1. Create a new table 2. Import the csv into this new table. I want to accomplish these 2 steps in one procedure for multiple thousands of files through automation.Priya Sreetharan– Priya Sreetharan2018-08-23 15:12:42 +00:00Commented Aug 23, 2018 at 15:12
-
In other words, I want to be able to create the tables on the fly and assign table names as source file names and then import data from source file into the table created for a bulk of files.Priya Sreetharan– Priya Sreetharan2018-08-23 15:16:08 +00:00Commented Aug 23, 2018 at 15:16
-
@sulabhchaturvedi the links you have provided have answers to tackle a single file import into a new table with manual table creation. But my question is different.Priya Sreetharan– Priya Sreetharan2018-08-23 15:17:53 +00:00Commented Aug 23, 2018 at 15:17
|
Show 5 more comments
2 Answers
COPY is usually the best solution. Depends of your constraints.
COPY table_name FROM 'path_readable_by_postgres/file.cvs';
And you can cat your files in a big one to import data quickly.
Look ta https://www.postgresql.org/docs/current/static/sql-copy.html for more details.
3 Comments
zeevb
Answered also here: stackoverflow.com/questions/2987433/…
Inder
as @zeevb pointed out this is same answer as provided by a different user in the link
Priya Sreetharan
I have multiple thousands of files. I don't want to manually add all the file names. ALso, for this command to work, I already need to have the table created, which I don't have. If I were to manually create, I'd have to create multiple thousands of table and it would be extremely time consuming.
You can use pandas library to read and transform data (if needed), sqlalchemy to create postgres engine and psycopg2 to load data into postgresql. I assume, that you've already created tables in Postgres DB. Try something like the code below
import pandas as pd
from sqlalchemy import create_engine
import pandas as pd
import psycopg2
# Drop "Unnamed: 0", as it often causes problems in writing to table
pd.read_csv({path/to/file.csv}, index_col={index_column}).drop(["Unnamed: 0"], axis=1)
# Now simply load your data into database
engine = create_engine('postgresql://user:password@host:port/database')
try:
pd_table.to_sql({'name_of_table_in_postgres_db'}, engine, if_exists='append')
except (Exception, psycopg2.DatabaseError) as error:
print(error)
finally:
print('Closed connection to the database')
5 Comments
Priya Sreetharan
So, no I haven't yet created the tables. The code above reads one csv and dumps it to a new table and I can loop it out for all the files?
Priya Sreetharan
The above code throws an error in line engine = try:
roganjosh
engine = try: why are you using assignment here?enoted
engine = try: --- sorry, I have lost some code, now it should be okayenoted
The code above adds one csv into postgresql table, created earlier. You can loop it to add all csv files into table.