How do I speed up my sqlite3 queries in Python?

Question

I have an sqlite table with a few hundred million rows:

sqlite> create table t1(id INTEGER PRIMARY KEY,stuff TEXT );

I need to query this table by its integer primary key hundreds of millions of times. My code:

conn = sqlite3.connect('stuff.db')
with conn:
    cur = conn.cursor()
    for id in ids:
        try:
            cur.execute("select stuff from t1 where rowid=?",[id])
            stuff_tuple = cur.fetchone()
            #do something with the fetched row
        except:
            pass #for when id is not in t1's key set

Here, ids is a list that may have tens of thousands of elements. Forming t1 did not take very long (ie ~75K inserts per second). Querying t1 the way I've done it is unacceptably slow (ie ~1K queries in 10 seconds).

I am completely new to SQL. What am I doing wrong?

"I have an sqlite table with a few hundred million rows". Unless you absolutely need to stick to SQLite you should drop it and use real database. SQLite is not meant to handle such amount of data efficiently. — kgr
– kgr, Commented Oct 25, 2012 at 1:14
Interesting, any suggestions? I was originally just using a dict, but it turns out that I will have too much data to fit in RAM. I figured SQLite was the way to go. — dranxo
– dranxo, Commented Oct 25, 2012 at 1:16
I don't want to start the usual dispute, but any of MySQL, PostgreSQL, MSSQL, Oracle should do just fine. What's important they allow you to fine tune their performance characteristics and also split the load across multiple machines. Simply put you have enterprise-grade amount of data, so you should use enterprise-grade database engine. If you're on linux I'd recommend using PostgreSQL, I've used it for handling large datasets and it worked fine. There's also a good book about fine tuning it - amazon.com/PostgreSQL-High-Performance-Gregory-Smith/dp/… (NO affiliation) — kgr
– kgr, Commented Oct 25, 2012 at 1:20
If you were using a dict, then it seems that you don't need a relational database. Perhaps a simple key-value store will do? You may want to look into Redis or CouchDB. — voithos
– voithos, Commented Oct 25, 2012 at 1:21
Redis or Mongdb or any nosql databases are easy to setup and maintain. If you are using any of the relational databases, you will have to create a schema but you could write stored procedures and not write queries in python code. — ronak
– ronak, Commented Oct 25, 2012 at 1:24

voithos · Accepted Answer · 2012-11-20 06:26:12Z

1

Since you're retrieving values by their keys, it seems like a key/value store would be more appropriate in this case. Relational databases (Sqlite included) are definitely feature-rich, but you can't beat the performance of a simple key/value store.

There are several to choose from:

Redis: "advanced key-value store", very fast, optimized for in-memory operation
Cassandra: extremely high performance, scalable, used by multiple high-profile sites
MongoDB: feature-rich, tries to be "middle ground" between relational and NoSQL (and they've started offering free online classes)

And there's many, many more.

answered Nov 20, 2012 at 6:26

voithos

70.9k12 gold badges107 silver badges120 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Pykler · Accepted Answer · 2012-10-25 04:08:47Z

-1

You should make one sql call instead, should be must faster

conn = sqlite3.connect('stuff.db')
with conn:
    cur = conn.cursor()

    for row in cur.execute("SELECT stuff FROM t1 WHERE rowid IN (%s)" % ','.join('?'*len(ids)), ids):
        #do something with the fetched row
        pass

you do not need a try except since ids not in the db will not show up. If you want to know which ids are not in the results, you can do:

ids_res = set()
for row in c.execute(...):
    ids_res.add(row['id'])
ids_not_found = ids_res.symmetric_difference(ids)

answered Oct 25, 2012 at 4:08

Pykler

14.9k9 gold badges43 silver badges51 bronze badges

Collectives™ on Stack Overflow

How do I speed up my sqlite3 queries in Python?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related