2

One line summary: I would like to 1) Spin up a Postgres database that runs in docker 2) Populate this PostgreSQL database with a Pandas data frame using SQLAlchemy from outside the container.


Docker runs fine:

CONTAINER ID        IMAGE                    COMMAND                  CREATED             STATUS              PORTS                    NAMES
27add831cce5        postgres:10.1-alpine     "docker-entrypoint.s…"   2 weeks ago         Up 2 weeks          5432/tcp                 django-postgres_db_1

I've been able to find posts on getting a pandas data frame to Postgres, and using SQLAlchemy to create a table in a Dockerized Postgres. Stitching that together I get the following that (sort of) works:

import numpy as np
import pandas as pd

from sqlalchemy import create_engine
from sklearn.datasets import load_iris


def get_iris():

    iris = load_iris()

    return pd.DataFrame(data=np.c_[iris['data'], iris['target']],
                        columns=iris['feature_names'] + ['target'])

df = get_iris()

print(df.head(n=5))

engine = create_engine(
    'postgresql://postgres:mysecretpassword@localhost:5432/postgres'.format(
    'django-postgres_db_1'))

df.to_sql('iris', engine)

Questions:

q.1) Is the above close to the preferred way of doing this?

q.2) Is there a way to create a db in Postgres using SQLAlchemy? E.g. so I don't have to manually add a new db or populate the default Postgres one.


Problems:

p.1) When I run the create_engine that 'works' I get the following error:

  File "/home/tmo/projects/toy-pipeline/venv/lib/python3.5/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py", line 683, in do_executemany
    cursor.executemany(statement, parameters)
KeyError: 'sepal length (cm'

However, if I run the code again, it says that the iris table already exists. If I manually access the Postgres db and do postgres=# TABLE iris it returns nothing.

p.2) I have a table in my Postgres db running in Docker called testdb

postgres=# \l
                                 List of databases
   Name    |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges
-----------+----------+----------+------------+------------+-----------------------
 postgres  | postgres | UTF8     | en_US.utf8 | en_US.utf8 |
 template0 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres
 testdb    | postgres | UTF8     | en_US.utf8 | en_US.utf8 |
(4 rows)

but if I try to insert that table in the create_engine I get an error:

conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) FATAL:  database "testdb" does not exist

(notice how postgres has been replaced by testdb):

engine = create_engine(
    'postgresql://postgres:mysecretpassword@localhost:5432/testdb'.format(
    'django-postgres_db_1'))

Update:

So, I think I've figured out what the problem might be: A incorrect use of hostname and address. I should mention that I am running on a Azure instance, on Ubuntu 16.04.

Here are some useful info on the container that is running Postgres:

HOSTNAME=96402054abb3
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/lib/postgresql/10/bin
PGDATA=/var/lib/postgresql/data
PG_MAJOR=10
PG_VERSION=10.5-1.pgdg90+1

And on etc/hosts

127.0.0.1   localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.17.0.2  96402054abb3

How do I construct my connection string properly? I've tried:

Container name as suggested here:

engine = create_engine(
    'postgresql://postgres:saibot@{}:5432/testdb'.format(
    'c101519547f8e89c3422ca9e1dc68d85ad9f24bd8e049efb37273782540646f0'))

OperationalError: (psycopg2.OperationalError) could not translate host name "96402054abb3" to address: Name or service not known

and I've tried putting in the ip, localhost, HOSTNAME etc. with no luck.

I am using this snippet of code to test if the db connects:

from sqlalchemy import create_engine
from sqlalchemy_utils import database_exists

engine = create_engine(
    'postgresql://postgres:[email protected]/testdb')

database_exists(engine.url)

1 Answer 1

1

I solved this by inserting the host ip of the container: 172.17.0.2 into the connection string as such:

'postgresql://postgres:[email protected]/raw_data'

Which in combination with a function solved my problem:

def db_create(engine_url, dataframe):
    """
    Check if postgres db exists, if not creates it
    """

    engine = create_engine(engine_url)

    if not database_exists(engine.url):
        print("Database does not exist, creating...")
        create_database(engine.url)

    print("Does it exist now?", database_exists(engine.url))

    if database_exists(engine.url):
        data_type = str(engine.url).rsplit('/', 1)[1]
        print('Populating database with', data_type)
        dataframe.to_sql(data_type, engine)

db_create('postgresql://postgres:[email protected]/raw_data')

will create a database called raw_data with a table called raw_data, and populate it with the target Pandas data frame.

Sign up to request clarification or add additional context in comments.

1 Comment

Did you get the "172.17.0.2" by using "docker inspect {container_id}" or some other command

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.