3

I need to convert pandas DataFrame object to a series of SQL statements that reproduce the object.

For example, suppose I have a DataFrame object:

>>> df = pd.DataFrame({'manufacturer': ['Audi', 'Volkswagen', 'BMW'], 
                       'model': ['A3', 'Touareg', 'X5']})
>>> df
  manufacturer    model
0         Audi       A3
1   Volkswagen  Touareg
2          BMW       X5

I need to convert it to the following SQL representation (not exactly the same):

CREATE TABLE "Auto" (
"index" INTEGER,
  "manufacturer" TEXT,
  "model" TEXT
);
INSERT INTO Auto (manufacturer, model) VALUES ('Audi', 'A3'), ('Volkswagen', 'Touareg'), ('BMW', 'X5');

Luckily, pandas DataFrame object has to_sql() method which allows dumping the whole DataFrame to a database through SQLAlchemy engine. I decided to use SQLite in-memory database for this:

>>> from sqlalchemy import create_engine
>>> engine = create_engine('sqlite://', echo=False)  # Turning echo to True just logs SQL statements, I'd avoid parsing this logs
>>> df.to_sql(name='Auto', con=engine)

I'm stuck at this moment. I can't dump SQLite in-memory database to SQL statements either I can't find sqlalchemy driver that would dump SQL statements into a file instead of executing them.

Is there a way to dump all queries sent to SQLAlchemy engine as SQL statements to a file?

My not elegant solution so far:

>>> from sqlalchemy import MetaData
>>> meta = MetaData()
>>> meta.reflect(bind=engine)
>>> print(pd.io.sql.get_schema(df, name='Auto') + ';')
CREATE TABLE "Auto" (
"manufacturer" TEXT,
  "model" TEXT
);
>>> print('INSERT INTO Auto ({}) VALUES\n{};'.format(', '.join([repr(c) for c in df.columns]), ',\n'.join([str(row[1:]) for row in engine.execute(meta.tables['Auto'].select())])))
INSERT INTO Auto ('manufacturer', 'model') VALUES
('Audi', 'A3'),
('Volkswagen', 'Touareg'),
('BMW', 'X5');

I would actually prefer a solution that does not require building the SQL statements manually.

1 Answer 1

3

SQLite actually allows one to dump the whole database to a series of SQL statements with dump command. This functionality is also available in python DB-API interface for SQLite: sqlite3, specifically, through connection object's iterdump() method. As far as I know, SQLAlchemy does not provide this functionality.

Thus, to dump pandas DataFrame to a series of SQL statements one needs to first dump it to in-memory SQLite database, and then dump this database using iterdump() method:

from sqlalchemy import create_engine    

engine = create_engine('sqlite://', echo=False)
df.reset_index().to_sql(name=table_name, con=engine)  # reset_index() is needed to preserve index column in dumped data

with engine.connect() as conn:
    for line in conn.connection.iterdump():
        stream.write(line)
        stream.write('\n')

engine().connect().connection allows to get raw DBAPI connection.

Sign up to request clarification or add additional context in comments.

1 Comment

There's also the API for raw connections in SQLA, i.e. Engine.raw_connection().

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.