0

I have a 20 GB csv file with 50 columns and 50 million records. I would like to automate this loading of huge csv file to my RDS postgresql instance using Python3. This csv file is stored in S3 bucket. Any help on this appreciated. Thanks.

2 Answers 2

1

First install psycopg2:

pip install psycopg2

Create your table (modify the sql to your needs):

conn = psycopg2.connect("dbname=dbname user=user")
cur = conn.cursor()
cur.execute("""CREATE TABLE sometablename(
some_col integer PRIMARY KEY,
some_col1 text,
some_col2 text,
some_col3 text)""")
conn.commit()

Load the data:

import psycopg2
conn = psycopg2.connect("host=localhost dbname=postgres user=postgres")
cur = conn.cursor()
with open('your_file.csv', 'r') as f:
next(f) # Skip the header row.
cur.copy_from(f, 'sometablename', sep=',')
conn.commit()

An alternative way would be through a subprocess:

host = "YOUR_HOST"
username = "YOUR_USERNAME"
dbname = "YOUR_DBNAME"

table_name = "my_table"
file_name = "my_10gb_file.csv"
command = "\copy {} FROM '{}' DELIMITER ',' CSV HEADER".format(table_name, file_name)

psql_template = 'psql -p 5432 --host {} --username {} --dbname {} --command "{}"'

bash_command = psql_template.format(host, username, dbname, command.strip())

process = subprocess.Popen(bash_command, stdout=subprocess.PIPE, shell=True) 

output, error = process.communicate()
Sign up to request clarification or add additional context in comments.

2 Comments

Also this link also useful. mydatahack.com/…
Will above methods be faster than using Pandas "to_sql" with "chunksize" and method "multi"?
0

RDS has a special extension to PostgreSQL for importing data from S3. You can use python's psycopg2 to invoke the aws_s3.table_import_from_s3() SQL function, but there is nothing particularly "pythonic" about doing so, any other way of issuing commands to the database would work as well.

If you don't like that for some reason, you can use one python library to open a stream from S3, then pass that file-like object to psycopg2's copy_from or copy_expert.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.