8

I have two large (1000+ object) ArrayLists that I need to compare and manipulate. I essentially need to take a value from ArrayList A, look for a matching object in ArrayList B, then manipulate the object from B. I need to do this in all objects for A. I need to do this frequently in the application. Order is not known and sizes will differ.

(pseudocode)
ArrayList<myObject> A
ArrayList<myObject> B

I could loop through every single item in B looking for the one that matches the entity from A, for each entity in A. That just seems so inefficient.

(pseudocode)
for (each object in A){loop through all of B and find it}

Would it be worth converting B to a HashMap (using the specific value I am comparing as the key, and the object as the value), then searching B that way, then converting that temporary HashMap back to an ArrayList when I'm done processing?

(pseudocode)
convert B to HashMap<myObject.myValue,myObject> C
for (each object in A){look up the value in C}
convert C back to an ArrayList

Is this a good idea? Or is this premature/unnecessary optimization? Thank you.

(Background: Data comes to me from the service as an ArrayList - and the frontend needs an ArrayList for the view layer. I'm trying to make this middle tier processing more efficient - but the entry and exit objects must be ArrayList (or some other list) )

7
  • Why would you bother converting it back to an ArrayList instead of holding on to B? And why would you bother rebuilding C every time? Commented May 28, 2019 at 23:22
  • One option is to keep B as a LinkedHashMap, instead of an ArrayList. This will allow you to preserve the order (which is apparently important to you, otherwise you would just have a HashMap to begin with) and still have amortized O(1) lookup. Commented May 28, 2019 at 23:24
  • @LouisWasserman I'm in a bit of a fixed position in the application I work on. Data comes to me from the service as an ArrayList - and the frontend needs an ArrayList for the view layer. I'm trying to make this middle tier processing more efficient - but the entry and exit objects must be ArrayList (or some other list). Commented May 28, 2019 at 23:26
  • 1
    The answer to "is this premature/unnecessary optimization" entirely depends on how performance-sensitive this area of the code is. You really need to profile it to find out. A nested loop like (pseudocode) foreach a in A: foreach b in B will do about 500,000 comparisons if len(A)==len(B)==1000. If you need to follow the pointers in each case and everything is uncached, this will take 50-100 ms. If you have better cache behavior, or don't need to follow the pointers, it will take less. How important is tens of milliseconds wherever this is called? Only you can answer that. Commented May 28, 2019 at 23:27
  • 1
    To clarify my previous comment: the pathological worst case cache behavior seems pretty unlikely, so I think this would take single digit milliseconds. But for performance analysis in this level of detail you really can’t know without profiling. Commented May 29, 2019 at 0:04

1 Answer 1

10

Yes, for large numbers, a HashMap is beneficial.

Your initial algorithm will take a long time, looping through both lists in nested for loops. This is an O(n2) algorithm. Even assuming 1000 items each in A and B, and assuming a cost of 1 for comparing two individual items, one from A and one from B, you're looking at 500k comparisons (avoiding comparing each item twice). Doing this frequently will cause slow performance.

Assuming you have a good hash code algorithm for your objects, looking up a value from a HashMap is O(1) access. You'll still spend O(n) time constructing it, but that's nothing compared to O(n2) if you have a large number of items.

Construct your HashMap "C" once using data from "B" and use it many times, as long as B's information hasn't changed. If you "need to do this frequently", then the performance will be even better because the HashMap is already built -- just reuse it.

If you need to maintain the order, store the B list index as the value in the hash map.

You don't need to be "converting that temporary hashmap back to an arraylist", because creating the HashMap "C" doesn't destroy the original list "B". One thing to be careful of is if the objects in the list B change, forcing updates to the HashMap to keep consistent. Another thing to watch is your memory usage for very large lists -- can you keep the objects, the list, and the hashmap in memory?

Your pseudocode:

for each index in B:
    get object b
    put in hash map C values (b, index)

for each a in A:
    if found in hash map C: do something with found object

For smaller numbers, the O(n2) performance times will be small enough that building the HashMap won't be worth it. That is a decision you'll need to make -- you would need to decide when the lists are large enough that building a HashMap and using it is worth the HashMap construction cost.

Sign up to request clarification or add additional context in comments.

3 Comments

My understanding of the question was that A and B are new every time. I.e. it works something like: (1) get a new A and B from some service, (2) run this O(n^2) algo on them once, (3) you are done. So the bit about how you can reuse B doesn't make sense. However the comment that precomputing the hash map from B is O(n) instead of O(n^2) still holds, even if you can't reuse it.
I like this answer. I do need to convert the updated map back to ArrayList, because I need to add values if I don't find a match during the lookup process. Although - I guess I could directly add the values to the arraylist when I find it doesnt exist in the map instead. So no conversion back needed there.
@BrennanVincent Yes, it was unclear what the circumstances of A and B were while I was writing my answer; they were updated to make it clear that they were used once.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.