SQLAlchemy + Pandas: saving array of strings to Postgres saves them as array of chars

Question

I am trying to save an array of strings to Postgres but when I check, the array of strings is saved as an array of chars. Example using sqlalchemy for my database engine

df = pd.read_csv('data.csv')
df.to_sql('tablename', dtypes={'array_col':sqlalchemy.dialects.postgresql.Array(sqlalchemy.dialects.postgresql.text)})

when I query for 'array_col', I'm expecting this:

['one','two']

What I get is this

['','o','n','e','','t','w','o']

Can you include the format the strings are stored in within the csv? — Ian Wilson
– Ian Wilson, Commented Apr 3, 2022 at 20:34

Ian Wilson · Accepted Answer · 2022-04-03 20:41:06Z

I think you need to convert the string from the csv into an array first using converters argument in read_csv() before calling to_sql.

This example assumes they are stored separated by commas themselves. I used hobbies as the name of my array column. Once the value was converted to a list I did not seem to pass dtype to to_sql but if you still need that then you would use dtype={'hobbies': ARRAY(TEXT)} in this example. I think ARRAY and TEXT are the correct types, not to be confused with array or text. I define my column array type in my sqlalchemy model below.


import sys
from io import StringIO
from sqlalchemy import (
    create_engine,
    Integer,
    String,
)
from sqlalchemy.schema import (
    Column,
)
from sqlalchemy.sql import select
from sqlalchemy.orm import declarative_base
import pandas as pd
from sqlalchemy.dialects.postgresql import ARRAY, TEXT

Base = declarative_base()


username, password, db = sys.argv[1:4]


engine = create_engine(f"postgresql+psycopg2://{username}:{password}@/{db}", echo=False)


class User(Base):
    __tablename__ = "users"
    id = Column(Integer, primary_key=True)
    name = Column(String(8), index=True)
    hobbies = Column(ARRAY(TEXT))


Base.metadata.create_all(engine)

csv_content = '''"name","hobbies"
1,"User 1","running,jumping"
2,"User 2","sitting,sleeping"
'''

with engine.begin() as conn:
    def convert_to_array(v):
        return [s.strip() for s in v.split(',') if s.strip()]

    content = StringIO(csv_content)
    df = pd.read_csv(content, converters={'hobbies': convert_to_array})
    df.to_sql('users', schema="public", con=conn, if_exists='append', index_label='id')


with engine.begin() as conn:
    for user in conn.execute(select(User)).all():
        print(user.id, user.name, user.hobbies)
        print ("|".join(user.hobbies))
        print (type(user.hobbies))

convert_to_array might need to be adjusted based on how your array is stored in your csv

Collectives™ on Stack Overflow

SQLAlchemy + Pandas: saving array of strings to Postgres saves them as array of chars

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related