Data Structures & Algorithms Optimal Solution Explanation

Question

im currently doing a ds&a udemy course as i am prepping for the heavy recruiting this upcoming fall. i stumbled upon a problem that prompted along the lines of:

"Given to list arrays, figure out what integer is missing from the second list array that was present in the first list array "

There were two solutions given in the course one which was considered a brute force solution and the other one the more optimal.

Here are the solutions:

def finderBasic(list1,list2):

    list1.sort()
    list2.sort()
    
    for i in range(len(list1)):
        if list1[i] != list2[i]:
            return list1[i]

def finderOptimal(list1,list2):

    d = collections.defaultdict(int)

    for num in list2:
        d[num] = 1
    
    for num in list1:
        if d[num] == 0:
            return num
        else:
            d[num] -= 1

The course explains that the finderOptimal is a more optimal way of solving the problem as it solves it in O(n) or linearly. Can someone please further explain to me why that is. I just felt like the finderBasic was much more simpler and only went through one loop. Any help would be much appreciated thank you!

Both implementations are incorrect unless you restrict the inputs. The second fails on [1, 1, 2], [1, 1]. The first fails on [1], [0, 1]. The second is particularly bad because it adds code that looks like it's intended to deal with duplicates. — Paul Hankin
– Paul Hankin, Commented Sep 6, 2020 at 6:56
set(list1) - set(list2) gives all integers that are in the first list that don't appear in the second list. — Paul Hankin
– Paul Hankin, Commented Sep 6, 2020 at 6:58
Actually, the first solution fails on list1=[1, 2, 3]; list2=[1, 2] with an index exception. — Paul Hankin
– Paul Hankin, Commented Sep 6, 2020 at 6:59

Ecto · Accepted Answer · 2020-09-06 06:45:10Z

1

You would be correct, if it was only about going through loop, the first solution would be better. -- as you said, going through one for loop (whole) takes O(n) time, and it doesn't matter if you go through it once, twice or c-times (as long as c is small enough).

However the heavy operation here is sorting, as it takes cca n*log(n) time, which is larger than O(n). That means, even if you run through the for loop twice in the 2nd solution, it will be still much better than sorting once.

Please note, that accessing dictionary key takes approximately O(1) time, so the time is still O(n) time with the loop.

Refer to: https://wiki.python.org/moin/TimeComplexity

The basic solution may be better for a reader, as it's very simple and straight forward, however it's more complex.

answered Sep 6, 2020 at 6:45

Ecto

1,1471 gold badge9 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Tanveer Badar · Accepted Answer · 2020-09-06 06:44:45Z

0

Disclaimer: I am not familiar with python.

There are two loops you are not accounting for in the first example. Each of those sort() calls would have at least two nested loops to implement the sorting. On top of that, usually the best performance you can get in the general case is O(n log(n)) when doing sorting.

The second case avoids all sorting and simply uses a "playcard" to mark what is present. Additionally, it uses dictionary which is a hash table. I am sure you have already learned that hash tables offer constant time - O(1) - operations.

Simpler does not always mean most efficient. Conversely, efficient is often hard to comprehend.

answered Sep 6, 2020 at 6:44

Tanveer Badar

5,5622 gold badges29 silver badges34 bronze badges

Collectives™ on Stack Overflow

Data Structures & Algorithms Optimal Solution Explanation

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related