0

I have a use case requiring the reliable, durable storage, and efficient retrieval, of a large number of index entries (for which I've an application-specific serialization to bytes that preserves order.) Intuition suggests using a log-structured merge-tree - hence considering RocksDB.

I am concerned about representational efficiency on durable storage. The serialization I have provides relatively short representations (median length, say, 20 bytes long). It preserves logical order. I expect typical adjacent keys will have matching prefixes for a significant proportion of their key-values - with distinctions usually only in the last few bytes. For my use-case, there is never meaningful information to store in the value part of the key-value pair.

Is there a way to configure RocksDB such that:

  1. The likely similarity of the prefix of the next key to the prefix of the current key is exploited to minimise the size of on-disk representation.
  2. No information about values is stored alongside keys [which for my use case are always the empty sequence of bytes].
  3. I can estimate the number of keys within a given range... without necessarily having to iterate over them and counting. (I'd like to be able to discriminate between when a key range spans billions of keys... relative to when it spans less than a dozen - say.)

Does RocksDB cover this sort of use-case? (If so is there a sample, or explicit documentation, to help clarify how to use it in this way?)

2
  • I'm just curious why not SQLite, Valkey, etc? Commented Jun 22 at 16:43
  • I rejected SQLite (and other SQL DBMS) because SQL is a higher-level interface than ideal for my use case. I want to be able to handle extremely large numbers of keys - but I don't need a relational query language or tables etc. I avoided ValKey (though I liked it) as, for my use case, the vast proportion of keys won't be in-memory. I want a low-level API that allows me to interact with (very many) short, variable length, binary keys... I'd prefer to avoid having to implement a bespoke (LSM-based) durable store to ensure throughput and data integrity at scale. Commented Jun 22 at 17:03

2 Answers 2

1

I'm not sure if I fully understand your question, but I assume you're asking whether a large number of key-value pairs—where similar keys tend to have similar values—can be stored efficiently.

RocksDB handles this scenario quite well. Keys with similar prefixes are likely to be placed and compacted into the same SST file. Within an SST file, values are compressed using algorithms like Snappy or LZ4, which are particularly effective when the values are similar. [1][2]

You might also want to look into Rockset’s converged index, which is designed with similar idea. [3][4]

[1] https://github.com/facebook/rocksdb/wiki/Compression

[2] https://github.com/facebook/rocksdb/wiki/Rocksdb-BlockBasedTable-Format

[3] https://dev.to/rocksetcloud/converged-indexing-the-secret-sauce-behind-rockset-s-fast-queries-3hp8

[4] https://www.youtube.com/watch?v=xGaUJTHuehQ

Sign up to request clarification or add additional context in comments.

1 Comment

Ta. I was hoping RocksDB would give me something resembling en.wikipedia.org/wiki/Incremental_encoding . I agree that generic snappy/LZ4 compression would compress common key prefixes (and suffixes too) but failing to embrace incremental encoding, for my keys, feels like a mistake. Rockset converged indexing seems even higher-level. As I only have keys (no values) and want to exploit incremental encoding... I wonder if I'm trying to force an oval peg into a rectangular hole when trying to use RocksDB for my 'only-keys' 'expect-long-common-prefixes' use case?
1

`DB::GetApproximateSizes` should be the nearest what you need, there is not a `DB::GetApproximateNum` as exactly what you need.

But RocksDB's perf & compression is poor for your workload, you can try ToplingDB, a rocksdb fork which replace MemTable, SST, ... with more efficient one, esp its SST using a kind of index NestLoudsTrie which is searchable compression algo, typically 5x compression ratio while directly (point)search & scan on compressed form with 1,000,000+ QPS.

1 Comment

DB::GetApproximateSizes(...) seems perfect for my estimation with RocksDB - Thanks. I assume you lead ToplingDB? The LOUDS Trie is interesting - but it doesn't seem a good fit for my use case... I do see its advantage - if it permits the entire index to be kept in RAM... but I envision having vast amounts of key data relative to available RAM. I suspect I need something bespoke. Open source demonstrates lots of interesting NOSQL techniques - but libraries seem to deliver features misaligned with my requirements... perhaps because modern LSM implementations share LevelDB heritage?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.