B-tree implementation for variable size keys

Question

I'm looking to implement a B-tree (in Java) for a "one use" index where a few million keys are inserted, and queries are then made a handful of times for each key. The keys are <= 40 byte ascii strings, and the associated data always takes up 6 bytes. The B-tree structure has been chosen because my memory budget does not allow me to keep the entire temporary index in memory.

My issue is about the practical details in choosing a branching factor and storing nodes on disk. It seems to me that there are two approaches:

One node always fit within one block. Achieved by choosing a branching factor k so that even for the worst case key-length the storage requirement for keys, data and control structures are <= the system block size. k is likely to be low, and nodes will in most cases have a lot of empty room.
One node can be stored on multiple blocks. Branching factor is chosen independent of key size. Loading a single node may require that multiple blocks are loaded.

The questions are then:

Is the second approach what is usually used for variable-length keys? or is there some completely different approach I have missed?
Given my use case, would you recommend a different overall solution?

I should in closing mention that I'm aware of the jdbm3 project, and is considering using it. Will attempt to implement my own in any case, both as a learning exercise and to see if case specific optimization can yield better performance.

Edit: Reading about SB-Trees at the moment:

A.H. · Accepted Answer · 2012-02-17 08:49:07Z

2

I'm missing option C here:

At least two tuples always fit into one block, the block size is chosen accordingly. Blocks are filled up with as many key/value pairs as possible, which means the branching factor is variable. If the blocksize is much greater than average size of a (key, value) tuple, the wasted space would be very low. Since the optimal IO size for discs is usually 4k or greater and you have a maximum tuple size of 46, this is automatically true in your case.

And for all options you have some variants: B* or B+ Trees (see Wikipedia).

answered Feb 17, 2012 at 8:49

A.H.

66.8k16 gold badges97 silver badges132 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jules Over a year ago

This is interesting, but the definition of a B-tree requires me to choose some constant q and then all nodes must contain between q and 2q entries. What if q entries don't fit into a block in some section of the tree?

Jan Kotek · Accepted Answer · 2012-02-18 01:05:14Z

1

JDBM BTree is already self balancing. It also have defragmentation which is very fast and solves all problems described above.

One node can be stored on multiple blocks. Branching factor is chosen independent of key size. Loading a single node may require that multiple blocks are loaded.

Not necessary. JDBM3 uses mapped memory, so it never reads full block from disk to memory. It creates 'a view' on top of block and only read partial data as actually needed. So instead of reading full 4KB block, it may read just 2x128 bytes. This depends on underlying OS block size.

Is the second approach what is usually used for variable-length keys? or is there some completely different approach I have missed?

I think you missed point that increasing disk size decreases performance, as more data have to be read. And single tree can have share both approaches (newly inserted nodes first, second after defragmentation).

Anyway, flat-file with mapped memory buffer is probably best for your problem. Since you have fixed record size and just a few million records.

Also have look at leveldb. It has new java port which almost beats JDBM:

https://github.com/dain/leveldb

http://code.google.com/p/leveldb/

answered Feb 18, 2012 at 1:05

Jan Kotek

1,0847 silver badges4 bronze badges

1 Comment

Jules Over a year ago

"JDBM3 uses mapped memory, so it never reads full block from disk to memory. It creates 'a view' on top of block and only read partial data as actually needed. So instead of reading full 4KB block, it may read just 2x128 bytes. This depends on underlying OS block size." -- this is just plain not true; memory mapped files use the hardware's paging capability, and therefore have to be implemented a full page size at a time. On x86, that means all memory mapped file operations happen in (at a minimum) 4kB blocks in all implementations.

A.H. · Accepted Answer · 2012-02-15 22:11:30Z

0

You could avoid this hassle if you use some embedded database. Those have solved these problems and some more for you already.

You also write: "a few million keys" ... "[max] 40 byte ascii strings" and "6 bytes [associated data]". This does not count up right. One gig of RAM would allow you more then "a few million" entries.

answered Feb 15, 2012 at 22:11

A.H.

66.8k16 gold badges97 silver badges132 bronze badges

1 Comment

Thingfish Over a year ago

The memory budget for this task is single digit MB. I'm not against using an existing db, but any solution has to run in-process, in-thread, fit memory budget and have a quick start up time to be usable for small inputs as well. JDBM3 might fit the bill, but I'm still interested in learning how it should be done :)

Collectives™ on Stack Overflow

B-tree implementation for variable size keys

3 Answers 3

1 Comment

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related