How to write DataFrame to postgres table

Question

There is DataFrame.to_sql method, but it works only for mysql, sqlite and oracle databases. I cant pass to this method postgres connection or sqlalchemy engine.

fpersyn · Accepted Answer · 2021-01-27 04:48:01Z

248

Starting from pandas 0.14 (released end of May 2014), postgresql is supported. The sql module now uses sqlalchemy to support different database flavors. You can pass a sqlalchemy engine for a postgresql database (see docs). E.g.:

from sqlalchemy import create_engine
engine = create_engine('postgresql://username:password@localhost:5432/mydatabase')
df.to_sql('table_name', engine)

You are correct that in pandas up to version 0.13.1 postgresql was not supported. If you need to use an older version of pandas, here is a patched version of pandas.io.sql: https://gist.github.com/jorisvandenbossche/10841234.
I wrote this a time ago, so cannot fully guarantee that it always works, buth the basis should be there). If you put that file in your working directory and import it, then you should be able to do (where con is a postgresql connection):

import sql  # the patched version (file is named sql.py)
sql.write_frame(df, 'table_name', con, flavor='postgresql')

edited Jan 27, 2021 at 4:48

fpersyn

1,1162 gold badges12 silver badges20 bronze badges

answered Apr 16, 2014 at 8:52

joris

140k37 gold badges257 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Quant Over a year ago

Did this make it to 0.14?

srodriguex Over a year ago

This post solved the problem for me: stackoverflow.com/questions/24189150/…

Saurabh Saha Over a year ago

Note: to_sql does not export array types in postgres.

Underoos Over a year ago

Instead of creating a new Sqlalchemy engine, can I use an existing Postgres connection created using psycopg2.connect()?

joris Over a year ago

For writing tables, that is not possible. It needs to be a sqlalchemy engine or connection.

|

Michael B. Currie · Accepted Answer · 2023-01-21 10:33:07Z

149

Faster option:

The following code will copy your Pandas DF to postgres DB much faster than df.to_sql method and you won't need any intermediate csv file to store the df.

Create an engine based on your DB specifications.

Create a table in your postgres DB that has equal number of columns as the Dataframe (df).

Data in DF will get inserted in your postgres table.

from sqlalchemy import create_engine
import psycopg2 
import io

If you want to replace the table, we can replace it with normal to_sql method using headers from our df and then load the entire big time consuming df into DB.

engine = create_engine(
    'postgresql+psycopg2://username:password@host:port/database')

# Drop old table and create new empty table
df.head(0).to_sql('table_name', engine, if_exists='replace',index=False)

conn = engine.raw_connection()
cur = conn.cursor()
output = io.StringIO()
df.to_csv(output, sep='\t', header=False, index=False)
output.seek(0)
contents = output.getvalue()
cur.copy_from(output, 'table_name', null="") # null values become ''
conn.commit()
cur.close()
conn.close()

edited Jan 21, 2023 at 10:33

Michael B. Currie

14.8k11 gold badges48 silver badges61 bronze badges

answered Dec 26, 2017 at 22:05

Aseem

6,9578 gold badges57 silver badges96 bronze badges

22 Comments

n1000 Over a year ago

What does the variable contents do? Should this be the one that is written in copy_from()?

moshevi Over a year ago

why do you do output.seek(0) ?

Shadi Over a year ago

This is so fast that it's funny :D

Jonas Palačionis Over a year ago

If you want to use schema, you can add schema=your_schema parameter in the to_sql part of the code.

Shadi Over a year ago

3 years later and I land here again ... ¯\_(ツ)_/¯

|

mgoldwasser · Accepted Answer · 2019-04-03 12:20:40Z

54

Pandas 0.24.0+ solution

In Pandas 0.24.0 a new feature was introduced specifically designed for fast writes to Postgres. You can learn more about it here: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-sql-method

import csv
from io import StringIO

from sqlalchemy import create_engine

def psql_insert_copy(table, conn, keys, data_iter):
    # gets a DBAPI connection that can provide a cursor
    dbapi_conn = conn.connection
    with dbapi_conn.cursor() as cur:
        s_buf = StringIO()
        writer = csv.writer(s_buf)
        writer.writerows(data_iter)
        s_buf.seek(0)

        columns = ', '.join('"{}"'.format(k) for k in keys)
        if table.schema:
            table_name = '{}.{}'.format(table.schema, table.name)
        else:
            table_name = table.name

        sql = 'COPY {} ({}) FROM STDIN WITH CSV'.format(
            table_name, columns)
        cur.copy_expert(sql=sql, file=s_buf)

engine = create_engine('postgresql://myusername:mypassword@myhost:5432/mydatabase')
df.to_sql('table_name', engine, method=psql_insert_copy)

answered Apr 3, 2019 at 12:20

mgoldwasser

15.6k16 gold badges86 silver badges107 bronze badges

10 Comments

ssword Over a year ago

For most of the time, add method='multi' option is fast enough. But yes, this COPY method is the fastest way right now.

DudeWah Over a year ago

Is this for csv's only? Can it be used with .xlsx as well? Some notes on what each part of this is doing would be helpful. The first part after the with is writing to an in memory buffer. The last part of the with is using an SQL statement and taking advantage of copy_expert's speed to bulk load the data. What is the middle part that starts with columns = doing?

Bowen Liu Over a year ago

This worked very well for me. And could you explain the keys arguments in the psql_insert_copy function please? How does it get any keys and are the keys just the column names?

mgoldwasser Over a year ago

@E.Epstein - you can modify the last line to df.to_sql('table_name', engine, if_exists='replace', method=psql_insert_copy) - this does create a table in your database.

cglacet Over a year ago

That's indeed very fast, I still had to use a chunk size for my large dataset (~200k rows) to make this work (it took 6.5s with a chunksize=10000). Any idea why they haven't added that function in pandas and only provided it as an example?

|

Underoos · Accepted Answer · 2019-11-04 10:11:23Z

44

This is how I did it.

It may be faster because it is using execute_batch:

# df is the dataframe
if len(df) > 0:
    df_columns = list(df)
    # create (col1,col2,...)
    columns = ",".join(df_columns)

    # create VALUES('%s', '%s",...) one '%s' per column
    values = "VALUES({})".format(",".join(["%s" for _ in df_columns])) 

    #create INSERT INTO table (columns) VALUES('%s',...)
    insert_stmt = "INSERT INTO {} ({}) {}".format(table,columns,values)

    cur = conn.cursor()
    psycopg2.extras.execute_batch(cur, insert_stmt, df.values)
    conn.commit()
    cur.close()

edited Nov 4, 2019 at 10:11

Underoos

5,2769 gold badges55 silver badges106 bronze badges

answered Sep 1, 2018 at 3:43

Behdad Forghani

5554 silver badges8 bronze badges

3 Comments

GeorgeLPerkins Over a year ago

I get AttributeError: module 'psycopg2' has no attribute 'extras'. Ah, this needs to be explicitly imported. import psycopg2.extras

Saurabh Saha Over a year ago

this function is much faster than the sqlalchemy solution

Jeong Kim Over a year ago

This doesn't seem to handle np.nan properly. If you use the code above, you will be likely to see 'NaN' strings instead of Nulls in the database.

Aseem · Accepted Answer · 2022-08-16 16:38:55Z

Faster way to write a df to a table in a custom schema with/without index:

"""
Faster way to write df to table.
Slower way is to use df.to_sql()
"""

from io import StringIO

from pandas import DataFrame
from sqlalchemy.engine.base import Engine


class WriteDfToTableWithIndexMixin:
    @classmethod
    def write_df_to_table_with_index(
            cls,
            df: DataFrame,
            table_name: str,
            schema_name: str,
            engine: Engine
    ):
        """
        Truncate existing table and load df into table.
        Keep each column as string to avoid datatype conflicts.
        """
        df.head(0).to_sql(table_name, engine, if_exists='replace',
                          schema=schema_name, index=True, index_label='id')

        conn = engine.raw_connection()
        cur = conn.cursor()
        output = StringIO()
        df.to_csv(output, sep='\t', header=False,
                  index=True, index_label='id')
        output.seek(0)
        contents = output.getvalue()
        cur.copy_expert(f"COPY {schema_name}.{table_name} FROM STDIN", output)
        conn.commit()


class WriteDfToTableWithoutIndexMixin:
    @classmethod
    def write_df_to_table_without_index(
            cls,
            df: DataFrame,
            table_name: str,
            schema_name: str,
            engine: Engine
    ):
        """
        Truncate existing table and load df into table.
        Keep each column as string to avoid datatype conflicts.
        """
        df.head(0).to_sql(table_name, engine, if_exists='replace',
                          schema=schema_name, index=False)

        conn = engine.raw_connection()
        cur = conn.cursor()
        output = StringIO()
        df.to_csv(output, sep='\t', header=False, index=False)
        output.seek(0)
        contents = output.getvalue()
        cur.copy_expert(f"COPY {schema_name}.{table_name} FROM STDIN", output)
        conn.commit()

If you have JSON values in a column in your df then above method will still load all data correctly but the json column will have some weird format. So converting that json column to ::json may generate error. You have to use to_sql() . Add method=multi to speed things up and add chunksize to prevent your machine from freezing:

df.to_sql(table_name, engine, if_exists='replace', schema=schema_name, index=False, method='multi', chunksize=1000)

Aadesh Baral · Accepted Answer · 2021-12-20 05:07:15Z

1

using psycopg2 you can use native sql commands to write data into a postgres table.

import psycopg2
import pandas as pd

conn = psycopg2.connect("dbname='{db}' user='{user}' host='{host}' port='{port}' password='{passwd}'".format(
            user=pg_user,
            passwd=pg_pass,
            host=pg_host,
            port=pg_port,
            db=pg_db))
cur = conn.cursor()    
def insertIntoTable(df, table):
        """
        Using cursor.executemany() to insert the dataframe
        """
        # Create a list of tupples from the dataframe values
        tuples = list(set([tuple(x) for x in df.to_numpy()]))
    
        # Comma-separated dataframe columns
        cols = ','.join(list(df.columns))
        # SQL query to execute
        query = "INSERT INTO %s(%s) VALUES(%%s,%%s,%%s,%%s)" % (
            table, cols)
    
        try:
            cur.executemany(query, tuples)
            conn.commit()

        except (Exception, psycopg2.DatabaseError) as error:
            print("Error: %s" % error)
            conn.rollback()
            return 1

edited Dec 20, 2021 at 5:07

answered Dec 19, 2021 at 8:31

Aadesh Baral

1167 bronze badges

1 Comment

Tyler2P Over a year ago

A good answer will always include an explanation why this would solve the issue, so that the OP and any future readers can learn from it.

Mayukh Ghosh · Accepted Answer · 2019-11-04 06:44:43Z

For Python 2.7 and Pandas 0.24.2 and using Psycopg2

Psycopg2 Connection Module

def dbConnect (db_parm, username_parm, host_parm, pw_parm):
    # Parse in connection information
    credentials = {'host': host_parm, 'database': db_parm, 'user': username_parm, 'password': pw_parm}
    conn = psycopg2.connect(**credentials)
    conn.autocommit = True  # auto-commit each entry to the database
    conn.cursor_factory = RealDictCursor
    cur = conn.cursor()
    print ("Connected Successfully to DB: " + str(db_parm) + "@" + str(host_parm))
    return conn, cur

Connect to the database

conn, cur = dbConnect(databaseName, dbUser, dbHost, dbPwd)

Assuming dataframe to be present already as df

output = io.BytesIO() # For Python3 use StringIO
df.to_csv(output, sep='\t', header=True, index=False)
output.seek(0) # Required for rewinding the String object
copy_query = "COPY mem_info FROM STDOUT csv DELIMITER '\t' NULL ''  ESCAPE '\\' HEADER "  # Replace your table name in place of mem_info
cur.copy_expert(copy_query, output)
conn.commit()

Romeo Kienzler · Accepted Answer · 2021-09-15 09:34:59Z

-1

Create engine (where dialect='postgres' or 'mysql', etc..):

from sqlalchemy import create_engine
engine = create_engine(f'{dialect}://{user_name}@{host}:{port}/{db_name}')
Session = sessionmaker(bind=engine) 

with Session() as session:
    df = pd.read_csv(path + f'/{file}') 
    df.to_sql('table_name', con=engine, if_exists='append',index=False)

edited Sep 15, 2021 at 9:34

Romeo Kienzler

3,5634 gold badges43 silver badges63 bronze badges

answered Jun 14, 2021 at 13:48

David

973 silver badges12 bronze badges

1 Comment

David Over a year ago

It works on most database including postgres. You have to specify the dialect in the engine = create_engine(dialect='postgres', etc....)

Collectives™ on Stack Overflow

How to write DataFrame to postgres table

8 Answers 8

6 Comments

22 Comments

10 Comments

3 Comments

Comments

1 Comment

Comments

Create engine (where dialect='postgres' or 'mysql', etc..):

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

8 Answers 8

6 Comments

22 Comments

10 Comments

3 Comments

Comments

1 Comment

Comments

Create engine (where dialect='postgres' or 'mysql', etc..):

1 Comment

Linked

Related