0

I have the following table in Postgres:

   Column   |            Type             | Modifiers 
------------+-----------------------------+-----------
 customer   | text                        | 
 feature    | character varying(255)      | 
 values     | character varying[]         | 
 updated_ts | timestamp without time zone |

And I'm trying to write the following pandas DataFrame

    customer     feature                       values           updated_ts
0     A             B                       [red, black]     2019-01-15 00:00:00 
1     A             B                       [blue, green]    2019-01-16 00:00:00

using the following code:

import psycopg2
...    
sio = BytesIO()
sio.write(df.to_csv(header=False, index=False, sep='\t', quoting=csv.QUOTE_NONE))
sio.seek(0)
with connection.cursor() as cursor: 
    cursor.copy_from(file=sio, table=table, columns=df.columns, sep='\t', null='')
    connection.commit()

But I'm getting the following error:

DataError('malformed array literal: "[\'red\', \'black\']"\nDETAIL: "[" must introduce explicitly-specified array dimensions.\nCONTEXT: COPY test_features_values, line 1, column values: "[\'red\', \'black\']"\n',)

How do I write it correctly?

1
  • 1
    I would say you are trying to load a list onto a DB column, there are several reawsons why that is NOT a good idea,( look for fourth normal form). Quick fix, convert the array to a string, either by col = str(col), or [less bad] col = ','.join(col). Proper fix, revisit your data models and yoiur DB implementation Commented Jan 30, 2019 at 9:33

1 Answer 1

3

I think you need to convert the list to a set:

df['values'] = df['values'].apply(set)

for the insert to work. The reason is that PostgreSQL expects arrays to be inserted using brace ({}) notation, instead of bracket ([]) notation. When you convert from a list to a set, the to_csv method represents the set using the braces in the same configuration PostgreSQL expects (which was a pleasant surprise; I've seen other representations which it ends up being much hackier to convert).

The other thing I'll note is that in order to get it to work, I had to switch from BytesIO to StringIO, because df.to_csv(...) isn't a bytes-like object.

When I made those changes, the insert was successful:

import csv
import pandas
import psycopg2
from io import StringIO 

# initialize connection
connection = psycopg2.connect('postgresql://scott:tiger@localhost:5432/mydatabase')

# create data
df = pandas.DataFrame({
    'customer': ['A', 'A'],
    'feature': ['B', 'B'],
    'values': [['red', 'black'], ['blue', 'green']],
    'updated_ts': ['2019-01-15 00:00:00', '2019-01-16 00:00:00']
})
# cast list to set
df['values'] = df['values'].apply(set)

# write data to postgres
sio = StringIO()
sio.write(df.to_csv(header=False, index=False, sep='\t', quoting=csv.QUOTE_NONE))
sio.seek(0)
with connection.cursor() as cursor: 
    cursor.copy_from(file=sio, table='test', columns=df.columns, sep='\t', null='')
    connection.commit()
Sign up to request clarification or add additional context in comments.

3 Comments

But you should remember that you lose element repetitions and their order after converting list to set
Anybody every find a solution that preserves repeated elements and element order?
Haven't tested the postgres side of it, but something like df['values'].apply(lambda x: "{" + ", ".join(x) + "}") would probably do it. What you're doing is formatting the list to look like {...} instead of [...]. It's all written to string in the I/O stage anyway. Worth noting that this is a few years old now and the JSON support in Postgres has really come a long way so there might be an easier answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.