5

I'm trying to implement a lambda foreach parallel Stream of an arraylist to improve performance of an existing application.

So far the foreach iteration without a parallel Stream creates the expected amount of data written into a database.

But when I switch to a parallelStream it always writes less rows into the database. Let's say from 10.000 expected, nearly 7000 rows, but result varies here.

Any idea what I am missing here, data race conditions, or must I work with locks and synchronized?

The code does something like this basically:

// Create Persons from an arraylist of data

arrayList.parallelStream()
          .filter(d -> d.personShouldBeCreated())
          .forEach(d -> {

   // Create a Person
   // Fill it's properties
   // Update object, what writes it into a DB

  }
);

Things I tried so far

Collect the result in an new List with...

collect(Collectors.toList())

...and then iterating over the new list and executing the logic, described in the first code snippet. The size of the new 'collected' ArrayList matches with the expected result, but at the end there is still less data created in the database.

Update/Solution:

Based on the answer I marked (and also the hints in the comments) concerning non-thread safe parts in that code, I implemented it as the following, what finally gives me the expected amount of data. Performance has improved, it takes now only 1/3 of the implementation before.

StringBuffer sb = new StringBuffer();
arrayList()
  .parallelStream()
  .filter(d-> d.toBeCreated())
  .forEach(d ->
    sb.append(
            // Build an application specific XML for inserting or importing data
    )
  );

The application specific part is an XML based data import api, but I think this could be done in plain SQL JDBC inserts.

9
  • 2
    Please provide us personShouldBeCreated() implementation Commented Jan 29, 2019 at 10:32
  • @Naya that is irrelevant here... Commented Jan 29, 2019 at 10:33
  • @Eugene Don't think so, It might contain some thread-unsafe code. Commented Jan 29, 2019 at 10:34
  • 2
    instead of doing what you are doing, look at bulk insert instead. collect all those elements to an ArrayList for example and insert all the entries in one go Commented Jan 29, 2019 at 10:35
  • 3
    You're using a non-thread-safe data structure, or your database interface objects aren't thread-safe. Can you add the relevant code (what you're doing in the the forEach lambda body)? Commented Jan 29, 2019 at 10:35

1 Answer 1

3

Most likely your code within the lambda is not thread safe because the code uses shared non concurrent data-structures or their manipulation requires locking

I suspect a batch/bulk insert is going to be faster that a parallel version which probably would end in sprawling short live connections that would compete between them for locking the tables your are inserting too.

Perhaps you could have some gains in terms of composing the bulk insert file contents in parallel though but that would depend on how a bulk insert can be realized thru your database API... does it need to be dump into a text file first? in that case your parallel stream could compose the different lines of that text in parallel and finally join them into the text file to load into the DB. Perhaps instead of a text file it allow you to use a collection/list of statement objects in memory, in this case you parallel-stream could create those objects in parallel and collect them into the final collection/list to be bulk inserted to your DB.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.