Multithreading Java regex

Question

I have a 1 producer, M consumer threaded pattern. The producer reads a raw document from disc and places it in a LinkedBlockingQueue. Each consumer thread then takes a raw document and parses the document using a class

ParsedDoc article = parseDoc(rawDocument);

The parseDoc class is a set of approximately 20 methods with the following pattern:

public String clearContent(String document) {
  Pattern regex = Pattern.compile(pattern);
  Matcher matcher = regex.matcher(document);
  matcher.find();
  ....
}

public String removeHTML(String document) {
  Pattern regex = Pattern.compile(pattern);
  Matcher matcher = regex.matcher(document);
  matcher.replaceAll("");
  ....
}

The problem I am facing is that the code runs reasonably fast on my local (2 core) machine. But when I run the same code on a 8 core machine the consumer performance degrades to almost a grinding halt. I've tried to optimize the jvm options to no avail. Removing the regex processing step resulted in expected x4 performance gains on the 8 core. So the problem is the regexes. I realize Pattern is thread safe and matcher may need to be reset(). But the problem is how to redesign the bank of regexes (in parseDoc class) so that it is thread-safe across M consumers.

Any help would be much appreciated

Thank you

erickson · Accepted Answer · 2012-03-18 06:32:41Z

2

Compiling the regex is slow. You should only do it once for a given pattern. Unless the pattern variable shown in your sample is really different for each invocation, the Pattern instances could probably be static class members. A Pattern is explicitly safe for concurrent use by multiple threads. (All of the modifiable state is held by the Matcher.)

Since the Matcher is confined to the stack of a single thread, you don't have any threading issues to worry about. Don't try to reuse a Matcher. It can be done, but I'd be surprised if recycling them saves much time compared to regex compilation.

answered Mar 18, 2012 at 6:32

erickson

271k59 gold badges406 silver badges502 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Peyman Over a year ago

Hi Erick. I tried moving the compiled regexes out of the class as suggested but the gains were marginal.

erickson Over a year ago

Does your profiling indicate CPU usage is pegged at 100% on all 8 cores? Can you be sure that the consumers aren't just waiting on the producer to read another file from disk?

Cœur · Accepted Answer · 2020-02-22 07:20:38Z

2

Producer consumer pattern doesn't scale well. The more producers or consumers you have the worse the performance you get. The reason is that the common queue becomes a bottleneck for the whole system. I hope you see how.

A better approach is to have no common queue; have each consumer have its own queue. When a request comes in have it goes to a load balancer. The load balancer will place the request in the consumer queue that is smallest. The balancer becomes the bottleneck, but it doesn't do a lot of Operations -just picks the right queue to send the incoming request - so it should be darn fast.

Here is an edit to answer your questions: Problem (more in depth): the more cores you have the slower it gets. Why? Shared memory.

@Peyman use ConcurrentLinkedQueue (which is a non blocking wait free queue where one enqueue and one dequeue can proceed concurrently). Even try it in your initial design and benchmark both designs. I expect your revised design to perform better because you can have only 1 enqueue and 1 dequeue at the same time, as opposed one enqueue and n dequeue as in your initial design (but this is just my speculation).

A great paper on scalable consumer producer by using balancers

Read this page (or can look only at "migrate from the common worker queue approach to the queue-per-thread approach")

Here's a list from http://www.javaperformancetuning.com/news/newtips128.shtml. I think the last 3 points are more applicable to you:

Most server applications use a common worker queue and thread pool; a shared worker queue holds short tasks that arrive from remote sources; a pool of threads retrieves tasks from the queue and processes the tasks; threads are blocked on the queue if there is no task to process.

A feeder queue shared amongst threads is an access bottleneck (from contention) when the number of tasks is high and the task time is very short. The bottleneck gets worse the more cores that are used.

Solutions available for overcoming contention in accessing a shared queue include: Using lock-free data structures; Using concurrent data structures with multiple locks; Maintaining multiple queues to isolate the contention.

A queue-per-thread approach eliminates queue access contention, but is not optimal when a queue is emptied while there is unprocessed queued data in other queues. To improve this, idle threads should have the ability to steal work from other queues. To keep contention to a minimum, the 'steal' should be done from the tail of the other queue (where normal dequeuing from the thread's own queue is done from the head of the queue).

edited Feb 22, 2020 at 7:20

Cœur

39k25 gold badges207 silver badges282 bronze badges

answered Mar 18, 2012 at 6:32

Adrian

5,6808 gold badges57 silver badges90 bronze badges

3 Comments

Martin James Over a year ago

In most cases, the time taken to push/pop a task, (ie. a 32/64-bit pointer/reference), from a queue is much, much smaller than the time taken to process the task & so contention is not a problem. If developers insist on pushing jobs onto a P-C queue that just add two integers together then, sure, that won't scale well. Pattern-matching on any reasonable dataset should be just fine for threading off to a pool.

Martin James Over a year ago

Do you have any examples of a load-balancer that demonstrably takes less time than a push/pop to a common blocking queue?

Peyman Over a year ago

Hi Adrian. I am not sure i implemented the load balancer correctly but i did follow your suggestions. Unfortunately i see no improvements. In fact it gets worse on the 8 core machine. Each thread now has a local LinkedList que and the producer thread reads the line and does the load balancing, populating each thread's que according to how much load they have. Is nonblocking que the right data structure for each consumer thread?

Java42 · Accepted Answer · 2012-03-18 06:35:30Z

0

If your not getting the concurrency you expect due to synchronization that is out of your control, one solution that comes to mind is to dispatch work to sub-processes (additional JVMs) upto the number of cores-1.

answered Mar 18, 2012 at 6:35

Java42

7,7161 gold badge34 silver badges51 bronze badges

Collectives™ on Stack Overflow

Multithreading Java regex

3 Answers 3

2 Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related