0

im currently doing a ds&a udemy course as i am prepping for the heavy recruiting this upcoming fall. i stumbled upon a problem that prompted along the lines of:

"Given to list arrays, figure out what integer is missing from the second list array that was present in the first list array "

There were two solutions given in the course one which was considered a brute force solution and the other one the more optimal.

Here are the solutions:

def finderBasic(list1,list2):

    list1.sort()
    list2.sort()
    
    for i in range(len(list1)):
        if list1[i] != list2[i]:
            return list1[i]

def finderOptimal(list1,list2):

    d = collections.defaultdict(int)

    for num in list2:
        d[num] = 1
    
    for num in list1:
        if d[num] == 0:
            return num
        else:
            d[num] -= 1

The course explains that the finderOptimal is a more optimal way of solving the problem as it solves it in O(n) or linearly. Can someone please further explain to me why that is. I just felt like the finderBasic was much more simpler and only went through one loop. Any help would be much appreciated thank you!

3
  • Both implementations are incorrect unless you restrict the inputs. The second fails on [1, 1, 2], [1, 1]. The first fails on [1], [0, 1]. The second is particularly bad because it adds code that looks like it's intended to deal with duplicates. Commented Sep 6, 2020 at 6:56
  • set(list1) - set(list2) gives all integers that are in the first list that don't appear in the second list. Commented Sep 6, 2020 at 6:58
  • Actually, the first solution fails on list1=[1, 2, 3]; list2=[1, 2] with an index exception. Commented Sep 6, 2020 at 6:59

2 Answers 2

1

You would be correct, if it was only about going through loop, the first solution would be better. -- as you said, going through one for loop (whole) takes O(n) time, and it doesn't matter if you go through it once, twice or c-times (as long as c is small enough).

However the heavy operation here is sorting, as it takes cca n*log(n) time, which is larger than O(n). That means, even if you run through the for loop twice in the 2nd solution, it will be still much better than sorting once.

Please note, that accessing dictionary key takes approximately O(1) time, so the time is still O(n) time with the loop.

Refer to: https://wiki.python.org/moin/TimeComplexity

The basic solution may be better for a reader, as it's very simple and straight forward, however it's more complex.

Sign up to request clarification or add additional context in comments.

Comments

0

Disclaimer: I am not familiar with python.

There are two loops you are not accounting for in the first example. Each of those sort() calls would have at least two nested loops to implement the sorting. On top of that, usually the best performance you can get in the general case is O(n log(n)) when doing sorting.

The second case avoids all sorting and simply uses a "playcard" to mark what is present. Additionally, it uses dictionary which is a hash table. I am sure you have already learned that hash tables offer constant time - O(1) - operations.

Simpler does not always mean most efficient. Conversely, efficient is often hard to comprehend.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.