I'm trying to implement a lambda foreach parallel Stream of an arraylist to improve performance of an existing application.
So far the foreach iteration without a parallel Stream creates the expected amount of data written into a database.
But when I switch to a parallelStream it always writes less rows into the database. Let's say from 10.000 expected, nearly 7000 rows, but result varies here.
Any idea what I am missing here, data race conditions, or must I work with locks and synchronized?
The code does something like this basically:
// Create Persons from an arraylist of data
arrayList.parallelStream()
.filter(d -> d.personShouldBeCreated())
.forEach(d -> {
// Create a Person
// Fill it's properties
// Update object, what writes it into a DB
}
);
Things I tried so far
Collect the result in an new List with...
collect(Collectors.toList())
...and then iterating over the new list and executing the logic, described in the first code snippet. The size of the new 'collected' ArrayList matches with the expected result, but at the end there is still less data created in the database.
Update/Solution:
Based on the answer I marked (and also the hints in the comments) concerning non-thread safe parts in that code, I implemented it as the following, what finally gives me the expected amount of data. Performance has improved, it takes now only 1/3 of the implementation before.
StringBuffer sb = new StringBuffer();
arrayList()
.parallelStream()
.filter(d-> d.toBeCreated())
.forEach(d ->
sb.append(
// Build an application specific XML for inserting or importing data
)
);
The application specific part is an XML based data import api, but I think this could be done in plain SQL JDBC inserts.
personShouldBeCreated()implementationArrayListfor example and insert all the entries in one goforEachlambda body)?