Query an existing/downloaded SQLite database in Python in memory, without having to write it to disk first

Question

I would like to query a SQLite database, downloaded from a remote sever for example via HTTP, in Python, without having ever written it to disk.

I see you can start a blank SQLite database in memory using the magic name :memory: https://docs.python.org/3/library/sqlite3.html. And I see you can copy from a disk-backed SQLite DB to an in-memory one, say using iterdump. But... I would like to populate a :memory: database with the contents of a bytes instance that contains the database, without ever having written those bytes to disk fist.

My reason: it seems unnecessary to have it all in memory, have to write it to disk, and then read it back to memory, so it's a general "keeping the number of steps down" in my process.

It would be neat to be able to do it with mmap, but sqlite3.connect only accepts path-like objects, not file-likes. Maybe there's a way to do it at the C level? — snakecharmerb
– snakecharmerb, Commented Jun 20, 2021 at 9:11
This answer points out that sqlite has a native mmap capability. — snakecharmerb
– snakecharmerb, Commented Jun 20, 2021 at 9:32
@snakecharmerb I'm not that familiar with mmap, but doesn't it need the file to be written to disk first? — Michal Charemza
– Michal Charemza, Commented Jun 20, 2021 at 9:42
Ah yes, I thought you could pass it bytes, but apparently not. — snakecharmerb
– snakecharmerb, Commented Jun 20, 2021 at 9:48
@snakecharmerb Indeed there is a way in C: sqlite.org/c3ref/serialize.html and sqlite.org/c3ref/deserialize.html — Shawn
– Shawn, Commented Jun 20, 2021 at 11:10

Michal Charemza · Accepted Answer · 2023-12-29 16:24:50Z

0

It doesn't support queries, but you can use https://github.com/uktrade/stream-sqlite to access the contents of a downloaded/downloading SQLite file without having it written to disk.

Taking the example from its README:

from stream_sqlite import stream_sqlite
import httpx

def sqlite_bytes():
    with httpx.stream('GET', 'https://www.parlgov.org/data/parlgov-development.db') as r:
        yield from r.iter_bytes(chunk_size=65_536)

for table_name, pragma_table_info, rows in stream_sqlite(sqlite_bytes(), max_buffer_size=1_048_576):
    for row in rows:
        print(row)

(Full disclosure: I was heavily involved in the development of stream-sqlite)

edited Dec 29, 2023 at 16:24

answered Aug 7, 2021 at 16:21

Michal Charemza

27.3k15 gold badges114 silver badges191 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Michal Charemza · Accepted Answer · 2023-12-29 16:30:19Z

0

You can use sqlite_deserialize API that is exposed as part of APSW that can accept the serialized bytes of a SQLite database:

import apsw
import httpx

url = "https://data.api.trade.gov.uk/v1/datasets/uk-trade-quotas/versions/v1.0.366/data?format=sqlite"

with apsw.Connection(':memory:') as db:
    # deserialize is used to replace a connected database with an in-memory
    # database. In this case, we replace the "main" database, which is the
    # `:memory:` one above
    db.deserialize('main', httpx.get(url).read())
    cursor = db.cursor()
    cursor.execute('SELECT * FROM quotas;')
    print(cursor.fetchall())

This is a version of the answer at https://stackoverflow.com/a/77731273/1319998

answered Dec 29, 2023 at 16:30

Michal Charemza

27.3k15 gold badges114 silver badges191 bronze badges

Collectives™ on Stack Overflow

Query an existing/downloaded SQLite database in Python in memory, without having to write it to disk first

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related