Java concurrent collection for write intensive application - only a few reads

Question

I'm searching various alternatives for a write intensive application when it comes to selecting a Java data structure. I know that ONE data structure could not provide a single universal solution solution to a write intensive application but I'm surprised by the lack of discussion out there on the topic.

There are many people talking about read-intensive-rate-writes or concurrent-read-only applications but I cannot find any conversation around the data structures used for a write intensive application.

Based on the following requirements

key/value pairs - Map
Unsorted - for the sake of simplicity
1000+ writes per minute / negligible reads
All data are stored in memory

I am thinking of the following approaches

Simple ConcurrentHashMap: Although based on this from official Oracle docs

[...] even though all operations are thread-safe, retrieval operations do not entail locking

It must be better suited for read intensive applications

Combination of a BlockingQueue and a set of ConcurrentHashMaps. In batches, the queue is drained of all its elements and then the updates are appropriately allocated in the underlying maps. In this approach though I would need an additional map to identify which maps are included to every map - acting like an orchestrator
Use a HashMap and synchronize on the API level. Meaning that every write related method is going to be synchronized

synchronized void aWriteMethod(Integer aKey,String aValue) {
   thisWriteIntensiveMap.put(aKey,aValue);
}

It'd be great if this question did not just receive criticism on the aforementioned options but also suggestions about new and better solutions.

PS: Apart from the integrity of the data, order of operations and throttling issues what else needs to be taken into account into choosing the "best" approach for a write intensive.

I know that this might look a bit open ended but it'd be interesting to hear how people think on this problem.

What is the use case? why map? If you have a write-intensive application and no need for deduping, you can use a linked list without locking the whole structure. — Burak Serdar
– Burak Serdar, Commented May 23, 2021 at 15:55
@BurakSerdar If I use a map then on every update/upsert/delete I would need to scan the entire list to find the entry I want to touch. This is why map seems like the more fitting choice — Niko
– Niko, Commented May 23, 2021 at 15:58
Another point is, that you need to understand e.g. the speed and algorithm of an insert operation, because then you know what kind of synchronization is going to be used. E.g. does only one value get a lock, does a subset of values get locked, or does the entire collection get locked. If you think of how a tree is rebalanced, I'm sure that has a huge synchronization overhead. In the end, it will be different from application to application, so loadtests/benchmarks are a good way to find out the best alternative. — SME_Dev
– SME_Dev, Commented May 23, 2021 at 16:11
You might prefer to use NonBlockingHashMap for a lock-free map, which offers higher write throughput at the cost of not supporting atomic compute methods. Otherwise if you do not need to read your writes immediately, an intermediate queue (via a ring buffer) allows you to cheaply record the update, drain it on a single thread, and penalize readers by having to catch-up. This record/replay approach is used by Caffeine cache, as every LRU read is an internal write. — Ben Manes
– Ben Manes, Commented May 23, 2021 at 17:31
@Niko Yes, though ArrayBlockingQueue uses a lock, whereas JCTools' offers bounded lock-free queues. Either way there is complexity of sharding and draining the queues. Your write rate of 16+/s is lowish, as ConcurrentHashMap can do 50M writes/s on a Zipfian distribution. You should benchmark via JMH, as you'll probably be fine with the stock hashmap. — Ben Manes
– Ben Manes, Commented May 23, 2021 at 21:25

pveentjer · Accepted Answer · 2021-05-24 02:47:05Z

3

Even if you would go for the worst possible type of map like e.g. a synchronized HashMap, I don't think you will notice any performance impact with your load. A few thousand writes per minute is nothing.

I would set up a JMH benchmark and try out various map implementations. The obvious candidate is a ConcurrentHashMap because it is designed to deal with concurrent access.

So I think this is a good example of premature optimization.

answered May 24, 2021 at 2:47

pveentjer

11.5k3 gold badges29 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Java concurrent collection for write intensive application - only a few reads

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related