Numpy arrays and slicing into/from sqlite

Ask Question

Asked 3 years, 9 months ago

Modified 3 years, 5 months ago

Viewed 825 times

The topic of 'how to store numpy' in sqlite has been discussed a couple of times. I think there are mainly two solutions:

Store as BLOB without numpy dataframe, requires np.frombuffer(data, dtype=np.type); as described here: https://assorted-experience.blogspot.com/2013/12/store-numpy-arrays-in-sqlite.html
Store as BLOB with numpy dataframe, register suitable adapter/converter; as described here: Python insert numpy array into sqlite3 database

Now let us suppose the numpy array is huge and only a slice of it is needed from the database. What is the best approach to select and retrieve the sliced data?

SELECT substr(data,start,length)

This will work for as long as data is BLOB and the numpy type np.int/uint8 was stored without the numpy data frame into BLOB. What about other data types such as np.uint64?

Of course, it is possible to store the numpy type in the sqlite database, too. Then any sliced-data request would need to adapt the SELECT request and the start/length information accordingly, i.e., scale it by the number of bytes of the respective data type.

Is there a better way to do this?

asked Feb 5, 2022 at 7:23

user26372

1291 gold badge2 silver badges14 bronze badges

For large arrays, this is not a good idea to embed the Numpy array in the SQLite database. This makes the DB requests far less efficient (most SQL databases are not optimized for that) and harder to migrate. I think the best is to store a reference to an external file (possibly a unique one regarding your needs). This also improve/fix the performance of slicing.

Jérôme Richard
– Jérôme Richard

2022-06-27 23:17:37 +00:00
Commented Jun 27, 2022 at 23:17
1

While this can certainly be done, it is probably a bit cumbersome as you need strides information as well as the datatype to extract a sliced data, unless you have 1D data. Also, you can certain peek here (even if it is C++, the idea is the same): stackoverflow.com/questions/3005231/…

norok2
– norok2

2022-06-28 08:14:53 +00:00
Commented Jun 28, 2022 at 8:14
Would "huge" mean that it would not fit into memory, or just some big array you do not really want to load entirely?

norok2
– norok2

2022-06-28 14:43:56 +00:00
Commented Jun 28, 2022 at 14:43
each entry could be 1M to 10M data points (int8 or int16) ... and sometimes I'm only interested in, e.g., 50 000 data points. of course, I could use HDF5, but these files get corrupted easily when creating them, are not as easy to use in terms of parallel threads, and I think are somewhat more limited when it comes to multi-dimensional data and searching (since no SQL-like querying exists)

user26372
– user26372

2022-06-28 17:26:11 +00:00
Commented Jun 28, 2022 at 17:26
What about using parallel HDF5 then? HDF5 seems clearly better suited for the mentioned usage. With a reference to an HDF5 entry you can benefit from the SQL request and fast slicing (without making your database slower -- considering SQLite is known to be already pretty slow). Note that if the extracted points are not contiguously stored, then it is should be faster to read the whole array (especially on HDD storages).

Jérôme Richard
– Jérôme Richard

2022-06-28 19:52:38 +00:00
Commented Jun 28, 2022 at 19:52

| Show 2 more comments

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Numpy arrays and slicing into/from sqlite

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked