1

I'm searching various alternatives for a write intensive application when it comes to selecting a Java data structure. I know that ONE data structure could not provide a single universal solution solution to a write intensive application but I'm surprised by the lack of discussion out there on the topic.

There are many people talking about read-intensive-rate-writes or concurrent-read-only applications but I cannot find any conversation around the data structures used for a write intensive application.

Based on the following requirements

  1. key/value pairs - Map
  2. Unsorted - for the sake of simplicity
  3. 1000+ writes per minute / negligible reads
  4. All data are stored in memory

I am thinking of the following approaches

  1. Simple ConcurrentHashMap: Although based on this from official Oracle docs

[...] even though all operations are thread-safe, retrieval operations do not entail locking

It must be better suited for read intensive applications

  1. Combination of a BlockingQueue and a set of ConcurrentHashMaps. In batches, the queue is drained of all its elements and then the updates are appropriately allocated in the underlying maps. In this approach though I would need an additional map to identify which maps are included to every map - acting like an orchestrator
  2. Use a HashMap and synchronize on the API level. Meaning that every write related method is going to be synchronized
synchronized void aWriteMethod(Integer aKey,String aValue) {
   thisWriteIntensiveMap.put(aKey,aValue);
}

It'd be great if this question did not just receive criticism on the aforementioned options but also suggestions about new and better solutions.

PS: Apart from the integrity of the data, order of operations and throttling issues what else needs to be taken into account into choosing the "best" approach for a write intensive.

I know that this might look a bit open ended but it'd be interesting to hear how people think on this problem.

17
  • What is the use case? why map? If you have a write-intensive application and no need for deduping, you can use a linked list without locking the whole structure. Commented May 23, 2021 at 15:55
  • @BurakSerdar If I use a map then on every update/upsert/delete I would need to scan the entire list to find the entry I want to touch. This is why map seems like the more fitting choice Commented May 23, 2021 at 15:58
  • 1
    Another point is, that you need to understand e.g. the speed and algorithm of an insert operation, because then you know what kind of synchronization is going to be used. E.g. does only one value get a lock, does a subset of values get locked, or does the entire collection get locked. If you think of how a tree is rebalanced, I'm sure that has a huge synchronization overhead. In the end, it will be different from application to application, so loadtests/benchmarks are a good way to find out the best alternative. Commented May 23, 2021 at 16:11
  • 1
    You might prefer to use NonBlockingHashMap for a lock-free map, which offers higher write throughput at the cost of not supporting atomic compute methods. Otherwise if you do not need to read your writes immediately, an intermediate queue (via a ring buffer) allows you to cheaply record the update, drain it on a single thread, and penalize readers by having to catch-up. This record/replay approach is used by Caffeine cache, as every LRU read is an internal write. Commented May 23, 2021 at 17:31
  • 1
    @Niko Yes, though ArrayBlockingQueue uses a lock, whereas JCTools' offers bounded lock-free queues. Either way there is complexity of sharding and draining the queues. Your write rate of 16+/s is lowish, as ConcurrentHashMap can do 50M writes/s on a Zipfian distribution. You should benchmark via JMH, as you'll probably be fine with the stock hashmap. Commented May 23, 2021 at 21:25

1 Answer 1

3

Even if you would go for the worst possible type of map like e.g. a synchronized HashMap, I don't think you will notice any performance impact with your load. A few thousand writes per minute is nothing.

I would set up a JMH benchmark and try out various map implementations. The obvious candidate is a ConcurrentHashMap because it is designed to deal with concurrent access.

So I think this is a good example of premature optimization.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.