3

I have a pandas data frame which I want to insert it into my Postgres database in my Django project.

The data frame has 5 columns and the Database table has 6 columns and moreover, the data frame columns and DB columns order are not the same.

So, before merging both, do I have to make sure that the order of the columns is the same in both the data frame and DB table? and how pls suggest how do I handle the missing column

3
  • if you insert into the table less columns then there actually are, and you have loose constraints (columns can be nullable) then the other columns should just get NULL inserted into them. best method I think would be pandas.DataFrame.to_csv --> io.StringIO --> psycopg2.cursor.copy_to Commented Jan 6, 2020 at 13:53
  • @aws_apprentice what about the order of the columns, does it matter? and I did not get what you said in the last line. can you please elaborate Commented Jan 6, 2020 at 13:57
  • the last line is the tools you can use to achieve this, and yes your data should be organized in the same way your insert query is Commented Jan 6, 2020 at 13:58

2 Answers 2

9

If dataframe has column names same as column names in database, you can insert df directly into database table using dataframe.to_sql() method with the help of sqlalchemy for connection:

from myapp.models import Bob
from sqlalchemy import create_engine
from django.conf import settings

db_connection_url = "postgresql://{}:{}@{}:{}/{}".format(
    settings.DATABASES['default']['USER'],
    settings.DATABASES['default']['PASSWORD'],
    settings.DATABASES['default']['HOST'],
    settings.DATABASES['default']['PORT'],
    settings.DATABASES['default']['NAME'],
)

engine = create_engine(db_connection_url)

df.to_sql(Bob._meta.db_table, engine, if_exists='append', index=False, chunksize=10000)

Missing column will be empty (or database will set default value if it defined at database level, not django level), or you can add missing column to dataframe with required value.

Sign up to request clarification or add additional context in comments.

Comments

3

Just do an Explicit Insert ...

If your table has columns in the order of A,B,C,D,E

But your Pandas has them in the order of D,C,B,A (Note no Column E)

Just generate an SQL Insert like (Note I have no Column E)

   insert into <TABLE> (D,C,B,A) values (row_iterator.D,row_iterator.C,...) 

For the Column E - the best and simplest solution is have a default value in the Db Definition ....

i.e.

CREATE TABLE Bob (
    A int NOT NULL,
    B int NOT NULL,
    C int NOT NULL,
    D int NOT NULL,
    E int DEFAULT 42
);

Hope that helps

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.