18

I'm trying to use the COPY command to insert data from a file into PGSQL via Python. This works incredibly well when the target table is empty or I ensure ahead of time there will be no unique key collisions:

cmd = ("COPY %s (%s) FROM STDIN WITH (FORMAT CSV, NULL '_|NULL|_')" %
               (tableName, colStr))
cursor.copy_expert(cmd, io)

I'd prefer however to be able to perform this COPY command without first emptying the table. Is there any way to do an 'INSERT or UPDATE' type operation with SQL COPY?

1
  • Two options: 1) copy to a temporary table and do the upsert from there 2) Use the file_fdw extension Commented Oct 25, 2017 at 15:13

3 Answers 3

25

Not directly through the copy command.

What you can do however is create a temporary table, populate that table with the copy command, and then do your insert and update from that.

-- Clone table stucture of target table
create temporary table __copy as (select * from my_schema.my_table limit 0);


-- Populate cloned table
copy __copy (
    column1,
    column2
) from STDIN with (
    format csv,
    null '_|NULL|_'
);


-- Update existing records
update
    my_schema.my_table
set
    column_2 = __copy.column_2
from
    __copy
where
    my_table.column_1 = __copy.column_1;


-- Insert new records
insert into my_schema.my_table (
    column_1,
    column_2
) (
    select
        column_1,
        column_2
    from
        __copy
        left join my_schema.my_table using(column_1)
    where
        my_table is null
);

You might consider creating an index on __copy after populating it with data to speed the update query up.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks, that's exactly what I ended up doing. (In principal... the detail is a bit different because I'm playing inside of Django as well so I used some of their mechanics, but same concept)
Do note. Postgres has the ON CONFLICT allowing one to combine both append and update queries similar to MySQL's UPSERT or SQL Server's/Oracle's MERGE.
@Parfait I always forget that thing exists :)
1

Consider using a temp table as staging table that receives csv file data. Then, run an append into final table using Postgres' CONFLICT (colname) DO UPDATE .... Available in version 9.3+. See docs. Do note that the special excluded table is used to reference values originally proposed for insertion.

Also, assuming you use pyscopg2, consider using sql.Identifier() to safely bind identifiers like table or column names. However, you would need to decompose colStr to wrap individual items:

from psycopg2 import sql
...
cursor.execute("DELETE FROM tempTable")
conn.commit()

cmd = sql.SQL("COPY {0} ({1}) FROM STDIN WITH (FORMAT CSV, NULL '_|NULL|_'))")\
              .format(sql.Identifier(temptableName),
                      sql.SQL(', ').join([sql.Identifier('col1'), 
                                          sql.Identifier('col2'), 
                                          sql.Identifier('col3')]))
cursor.copy_expert(cmd, io)

sql = "INSERT INTO finalTable (id_column, Col1, Col2, Col3)" + \
      " SELECT id_column, Col1, Col2, Col3 FROM tempTable t" + \
      " ON CONFLICT (id_column) DO UPDATE SET Col1 = EXCLUDED.Col1," + \
      "                                       Col2 = EXCLUDED.Col2," + \
      "                                       Col3 = EXCLUDED.Col3 ...;"

cursor.execute(sql)
conn.commit()

1 Comment

I thought this was the perfect solution... but I got the following error: ERROR: ON CONFLICT DO UPDATE command cannot affect row a second time HINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values. :(
-2

Referring to the PostgreSQL documentation there is none reason you can't add data into an already existing table : https://www.postgresql.org/docs/9.6/static/sql-copy.html

COPY FROM copies data from a file to a table (appending the data to whatever is in the table already)

So I think you have another error somewhere. Could you give us more details about the message you get from PostgreSQL when you try to insert second time data in your table ?

3 Comments

If there are primary key conflicts between the source file and the target table this will cause the entire query to fail. Also, this will not update existing records where applicable, but will instead only insert.
The COPY command throws errors when there are unique key collisions in the data. My questions was more related to if there's some other method to bulk load data like this while updating colliding rows when encountered rather than just attempting inserts.
In this case with COPY unfortunately no ... the best way in this case is to copy in a temporary table, then update existing record of the main table, and insert non existing records in the main table.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.