I have a use case requiring the reliable, durable storage, and efficient retrieval, of a large number of index entries (for which I've an application-specific serialization to bytes that preserves order.) Intuition suggests using a log-structured merge-tree - hence considering RocksDB.
I am concerned about representational efficiency on durable storage. The serialization I have provides relatively short representations (median length, say, 20 bytes long). It preserves logical order. I expect typical adjacent keys will have matching prefixes for a significant proportion of their key-values - with distinctions usually only in the last few bytes. For my use-case, there is never meaningful information to store in the value part of the key-value pair.
Is there a way to configure RocksDB such that:
- The likely similarity of the prefix of the next key to the prefix of the current key is exploited to minimise the size of on-disk representation.
- No information about values is stored alongside keys [which for my use case are always the empty sequence of bytes].
- I can estimate the number of keys within a given range... without necessarily having to iterate over them and counting. (I'd like to be able to discriminate between when a key range spans billions of keys... relative to when it spans less than a dozen - say.)
Does RocksDB cover this sort of use-case? (If so is there a sample, or explicit documentation, to help clarify how to use it in this way?)