1

I have two list containing an important number of object with each N elements:

List<Foo> objectsFromDB = {{MailId=100, Status=""}, {{MailId=200, Status=""}, {MailId=300, Status=""} ... {MailId=N , Status= N}}

List <Foo> feedBackStatusFromCsvFiles = {{MailId=100, Status= "OPENED"}, {{MailId=200, Status="CLICKED"}, {MailId=300, Status="HARDBOUNCED"} ... {MailId=N , Status= N}} 

Little Insights: objectFromDB retrieves row of my database by calling a Hibernate method.

feedBackStatusFromCsvFiles calls a CSVparser method and unmarshall to Java objects.

My entity class Foo has all setters and getters. So I know that the basic idea is to use a foreach like this:

     for (Foo fooDB : objectsFromDB) {
          for(Foo fooStatus: feedBackStatusFromCsvFiles){
              if(fooDB.getMailId().equals(fooStatus.getMailId())){
                    fooDB.setStatus(fooStatus.getStatus());
                }
               }
            }

As far as my modest knowledge of junior developer is, I think it is a very bad practice doing it like this? Should I implement a Comparator and use it for iterating on my list of objects? Should I also check for null cases?

Thanks to all of you for your answers!

5
  • 2
    There probably is a better way of doing it but with your current implementation you should at least put a break; line after the setStatus because once you've found a match theres no point checking the rest of the objects in the list. Commented Mar 24, 2017 at 15:23
  • 2
    If you use nested for loops as you do, then the body of the inner loop will be executed O(n^2) times. This is probably ok if the number of elements is sure to be small, but if it may grow to hundreds or thousands of elements then it could be too costly. Commented Mar 24, 2017 at 15:23
  • Are you sure both elements will be the same count? When you use N, it makes me think you will have the same number. Otherwise I would expect N and M or something like that. Commented Mar 24, 2017 at 15:25
  • 2
    If the two lists were each sorted by mailId, then you could iterate through them cooperatively, at O(n) cost. If they are not already sorted that way, however, then sorting them first probably costs O(n log n). That will be worthwhile, performance-wise, for lists larger than some minimum size, but not if your lists are reliably small. Commented Mar 24, 2017 at 15:26
  • Thanks to all of you for your answers. My lists will have indeed large amount of elements (ten thousands). I meant N , N number of possible elements inside my list. @JohnBollinger the lists are not sorted at all neither by mailId neither by an other field. Should I do it as you suggest? And also my other important part of question is, to answer is HOW should I iterate? Using comparator or for each according to you guys? Commented Mar 24, 2017 at 15:42

4 Answers 4

3

Assuming Java 8 and considering the fact that feedbackStatus may contain more than one element with the same ID.

  1. Transform the list into a Map using ID as key and having a list of elements.
  2. Iterate the list and use the Map to find all messages.

The code would be:

final Map<String, List<Foo>> listMap = 
objectsFromDB.stream().collect(
      Collectors.groupingBy(item -> item.getMailId())
);

for (final Foo feedBackStatus : feedBackStatusFromCsvFiles) {
        listMap.getOrDefault(feedBackStatus.getMailId(), Colleactions.emptyList()).forEach(item -> item.setStatus(feedBackStatus.getStatus()));
}
Sign up to request clarification or add additional context in comments.

1 Comment

Maybe change listMap.get(feedBackStatus.getMailId()) to listMap.getOrDefault(feedBackStatus.getMailId(), Collections.emptyList()) to do nothing if the ID is absent, like with the original code.
2

Use maps from collections to avoid the nested loops.

    List<Foo> aList = new ArrayList<>();
    List<Foo> bList = new ArrayList<>();
    for(int i = 0;i<5;i++){
        Foo foo = new Foo();
        foo.setId((long) i);
        foo.setValue("FooA"+String.valueOf(i));
        aList.add(foo);
        foo = new Foo();
        foo.setId((long) i);
        foo.setValue("FooB"+String.valueOf(i));
        bList.add(foo);
    }

    final Map<Long,Foo> bMap = bList.stream().collect(Collectors.toMap(Foo::getId, Function.identity()));

    aList.stream().forEach(it->{
        Foo bFoo = bMap.get(it.getId());
        if( bFoo != null){
            it.setValue(bFoo.getValue());
        }
    });

The only other solution would be to have the DTO layer return a map of the MailId->Foo object, as you could then use the CVS list to stream, and simply look up the DB Foo object. Otherwise, the expense of sorting or iterating over both of the lists is not worth the trade-offs in performance time. The previous statement holds true until it definitively causes a memory constraint on the platform, until then let the garbage collector do its job, and you do yours as easy as possible.

3 Comments

Instead of iterating over the keys and perform a lookup for every key, like aMap.keySet().stream().forEach(it->{ Foo aFoo = aMap.get(it); …, you can simply iterate over the key/value pairs, like aMap.forEach((it,aFoo) -> {…, but since this is a linear iteration which doesn’t benefit from the map at all, you actually don’t need the aMap, as you can do aList.forEach(aFoo -> { Long it = aFoo.getId(); … in the first place.
You are certainly correct, but I broke it out so it was a bit more obvious what was going on since the OP wasn't using any streams. This is also why I iterated in a for loop instead of using streams to generate.
Won't let me edit the comment, but I'm going to update and remove the extra map as the bList already shows that functionality anyway, and this would be a huge additional memory footprint for the large dataset. Thanks!
1

Given that your lists may contain tens of thousands of elements, you should be concerned that you simple nested-loop approach will be too slow. It will certainly perform a lot more comparisons than it needs to do.

If memory is comparatively abundant, then the fastest suitable approach would probably be to form a Map from mailId to (list of) corresponding Foo from one of your lists, somewhat as @MichaelH suggested, and to use that to match mailIds. If mailId values are not certain to be unique in one or both lists, however, then you'll need something a bit different than Michael's specific approach. Even if mailIds are sure to be unique within both lists, it will be a bit more efficient to form only one map.

For the most general case, you might do something like this:

// The initial capacity is set (more than) large enough to avoid any rehashing
Map<Long, List<Foo>> dbMap = new HashMap<>(3 * objectFromDb.size() / 2);

// Populate the map
// This could be done more effciently if the objects were ordered by mailId,
// which perhaps the DB could be enlisted to ensure.
for (Foo foo : objectsFromDb) {
    Long mailId = foo.getMailId();
    List<Foo> foos = dbMap.get(mailId);

    if (foos == null) {
        foos = new ArrayList<>();
        dbMap.put(mailId, foos);
    }
    foos.add(foo);
}

// Use the map
for (Foo fooStatus: feedBackStatusFromCsvFiles) {
    List<Foo> dbFoos = dbMap.get(fooStatus.getMailId());

    if (dbFoos != null) {
        String status = fooStatus.getStatus();

        // Iterate over only the Foos that we already know have matching Ids
        for (Foo fooDB : dbFoos) {
            fooDB.setStatus(status);
        }
    }
}

On the other hand, if you are space-constrained, so that creating the map is not viable, yet it is acceptable to reorder your two lists, then you should still get a performance improvement by sorting both lists first. Presumably you would use Collections.sort() with an appropriate Comparator for this purpose. Then you would obtain an Iterator over each list, and use them to iterate cooperatively over the two lists. I present no code, but it would be reminiscent of the merge step of a merge sort (but the two lists are not actually merged; you only copy status information from one to the other). But this makes sense only if the mailIds from feedBackStatusFromCsvFiles are all distinct, for otherwise the expected result of the whole task is not well determined.

7 Comments

My example has them just to generate a list, but there is no requirement the sizes be the same in my example. With his logic, the driving factor was matching the records from the DB to the CSV. You should be yielding major efficiency concerns like 'memory space' to the runtime to optimize in Java 8, as it will be faster than even your 'better' optimizations can ever become.
@Holger, that's a typo: I meant 3 * objectFromDb.size() / 2 (now fixed). That gives the initial capacity (number of hash buckets) in the map. This figure is slightly larger than is needed to avoid any rehashing of the map. See the docs of HashMap for more details.
@MichaelH, don't be absurd. There are space vs. speed tradeoffs between various alternatives here, and space is not an infinite resource. Some environments are indeed space-constrained, even on computers with large physical and virtual memory, which anyway are by no means the only machines of interest these days. As a programmer, you may have the luxury of access to all the space you could possibly need, but you do not have the luxury of assuming that you have so much space without actually considering the question. Java cannot "optimize" this.
You are without a doubt correct in theory. However you're making that argument while not leveraging the runtime optimized stream processing within your example. With the added constraint of tens of thousands of records, neither of our approaches are optimal. However while our strategy is similar, yours is using sub-optimal iteration strategy for only a perceived gain of memory space through a single variable instead of 2. All while you instantiate a list for every ID in the map. I'll update my post with the additional information provided by the OP.
I see, 3 * size / 2 makes indeed sense with the default load factor of .75f
|
1

your problem is merging Foo's last status into Database objects.so you can do it in two steps that will make it more clearly & readable.

  1. filtering Foos that need to merge.
  2. merging Foos with last status.

    //because the status always the last,so you needn't use groupingBy methods to create a complex Map.
    Map<String, String> lastStatus = feedBackStatusFromCsvFiles.stream()
            .collect(toMap(Foo::getMailId, Foo::getStatus
                           , (previous, current) -> current));
    //find out Foos in Database that need to merge
    Predicate<Foo> fooThatNeedMerge = it -> lastStatus.containsKey(it.getMailId());
    //merge Foo's last status from cvs.
    Consumer<Foo> mergingFoo = it -> it.setStatus(lastStatus.get(it.getMailId()));
    
    objectsFromDB.stream().filter(fooThatNeedMerge).forEach(mergingFoo);
    

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.