1

I would like to query a SQLite database, downloaded from a remote sever for example via HTTP, in Python, without having ever written it to disk.

I see you can start a blank SQLite database in memory using the magic name :memory: https://docs.python.org/3/library/sqlite3.html. And I see you can copy from a disk-backed SQLite DB to an in-memory one, say using iterdump. But... I would like to populate a :memory: database with the contents of a bytes instance that contains the database, without ever having written those bytes to disk fist.

My reason: it seems unnecessary to have it all in memory, have to write it to disk, and then read it back to memory, so it's a general "keeping the number of steps down" in my process.

6
  • It would be neat to be able to do it with mmap, but sqlite3.connect only accepts path-like objects, not file-likes. Maybe there's a way to do it at the C level? Commented Jun 20, 2021 at 9:11
  • This answer points out that sqlite has a native mmap capability. Commented Jun 20, 2021 at 9:32
  • 1
    @snakecharmerb I'm not that familiar with mmap, but doesn't it need the file to be written to disk first? Commented Jun 20, 2021 at 9:42
  • 1
    Ah yes, I thought you could pass it bytes, but apparently not. Commented Jun 20, 2021 at 9:48
  • 1
    @snakecharmerb Indeed there is a way in C: sqlite.org/c3ref/serialize.html and sqlite.org/c3ref/deserialize.html Commented Jun 20, 2021 at 11:10

2 Answers 2

0

It doesn't support queries, but you can use https://github.com/uktrade/stream-sqlite to access the contents of a downloaded/downloading SQLite file without having it written to disk.

Taking the example from its README:

from stream_sqlite import stream_sqlite
import httpx

def sqlite_bytes():
    with httpx.stream('GET', 'https://www.parlgov.org/data/parlgov-development.db') as r:
        yield from r.iter_bytes(chunk_size=65_536)

for table_name, pragma_table_info, rows in stream_sqlite(sqlite_bytes(), max_buffer_size=1_048_576):
    for row in rows:
        print(row)

(Full disclosure: I was heavily involved in the development of stream-sqlite)

Sign up to request clarification or add additional context in comments.

Comments

0

You can use sqlite_deserialize API that is exposed as part of APSW that can accept the serialized bytes of a SQLite database:

import apsw
import httpx

url = "https://data.api.trade.gov.uk/v1/datasets/uk-trade-quotas/versions/v1.0.366/data?format=sqlite"

with apsw.Connection(':memory:') as db:
    # deserialize is used to replace a connected database with an in-memory
    # database. In this case, we replace the "main" database, which is the
    # `:memory:` one above
    db.deserialize('main', httpx.get(url).read())
    cursor = db.cursor()
    cursor.execute('SELECT * FROM quotas;')
    print(cursor.fetchall())

This is a version of the answer at https://stackoverflow.com/a/77731273/1319998

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.