0

I want to ignore the duplicates from an array that has multiple array in lowest running cost. For example;

A = [['1','2'],['3','4'],['5','6'],['1','2'],['3','4'],['7','8']]

the expected output should be like as

Output = [['1','2'],['3','4'],['5','6'],['7','8']]

Is it possible to compare arrays inside an array. I am doing in this way;

 A = [['1','2'],['3','4'],['5','6'],['1','2'],['3','4'],['7','8']]
        output = set()
        for x in A:
            output.add(x)
        print (output)

But it prompts;

TypeError: unhashable type: 'list'

3
  • 2
    You can't put a list in a set because a list is not hashable. And even if that worked, you'll lose the ordering of the items. Commented Feb 23, 2017 at 17:23
  • @MosesKoledoye Order does not matter but the cost matters. Commented Feb 23, 2017 at 17:25
  • 3
    Also, those are lists, not arrays Commented Feb 23, 2017 at 17:29

5 Answers 5

2

How about something simple like:

B = list(map(list, set(map(tuple, A))))

Here's my "bakeoff" -- please let me know if I've misrepresented your solution:

import timeit
from random import choice

DIGITS = list("123456789")

# one million elements in list
A = [[choice(DIGITS), choice(DIGITS)] for _ in range(1000000)]

def elena(A):  # MrName's solution is identical
    B = []

    for i in A:
        if i not in B:
            B.append(i)
    return B

def cdlane(A):

    return list(map(list, set(map(tuple, A))))

def VikashSingh(A):
    uniques = set()
    B = []

    for x in A:
        val = '-'.join([str(key) for key in x])
        if val not in uniques:
            B.append(x)
            uniques.add(val)
    return B

def AbhilekhSingh(A):
    def unique_elements(l):
        last = object()
        for item in l:
            if item == last:
                continue
            yield item
            last = item

    return list(unique_elements(sorted(A)))

# sanity check to make sure everyone one agrees on the answer
B = sorted(elena(A))
assert(B == sorted(cdlane(A)))
assert(B == sorted(VikashSingh(A)))
assert(B == sorted(AbhilekhSingh(A)))

print("elena:", format(timeit.timeit('B = elena(A)', number=10, globals=globals()), ".3"))

print("cdlane:", format(timeit.timeit('B = cdlane(A)', number=10, globals=globals()), ".3"))

print("VikashSingh:", format(timeit.timeit('B = VikashSingh(A)', number=10, globals=globals()), ".3"))

print("AbhilekhSingh:", format(timeit.timeit('B = AbhilekhSingh(A)', number=10, globals=globals()), ".3"))

RESULTS

elena: 17.5
cdlane: 2.04
VikashSingh: 10.0
AbhilekhSingh: 8.83
Sign up to request clarification or add additional context in comments.

14 Comments

Kindly mention the running cost as well.
This will change the order of the list elements, though, which might or might not be a problem.
What is the running cost?
@sphericalcowboy, that was already addressed by the OP in the comments to his post.
I don't think log(n) is possible. Because if log(n) was possible we could reverse engineer it and use it for sorting of list. which we know can not be log(n).
|
1

Here's a simple solution:

In [27]: A = [['1','2'],['3','4'],['5','6'],['1','2'],['3','4'], ['7','8']]

In [28]: new_list = []

In [29]: for i in A:
    ...:     if i not in new_list:
    ...:         new_list.append(i)
    ...:         

In [30]: new_list
Out[30]: [['1', '2'], ['3', '4'], ['5', '6'], ['7', '8']]

2 Comments

Kindly mention the running cost as well. I think it has o(n). is it?
if i not in new_list: won't this be a costly step and make the process n^2?
1

You can sort the list and compare every element with it's previous one.

List length: n
Element length: m 
Complexity: Sorting(n * log(n) * m) + Comparison(n * m) = Total(n * log(n) * m)

Try this:

def unique_elements(l):
    last = object()
    for item in l:
        if item == last:
            continue
        yield item
        last = item

def remove_duplicates(l):
    return list(unique_elements(sorted(l)))

3 Comments

Kindly mention the running cost as well. I think it has o(n). is it?
Added the running cost and let me know of you have any questions regarding it.
your solution on 1Million list is giving CPU times: user 1.21 s, sys: 17.9 ms, total: 1.23 s
1

Another potentially simple solution, but not sure how the "cost" would compare to other presented solutions:

A = [['1','2'],['3','4'],['5','6'],['1','2'],['3','4'],['7','8']]

res = []
for entry in A:
    if not entry in res:
        res.append(entry)

1 Comment

For the input data provided, this solution is quite adequate, order-preserving, and as straightforward as it gets. For a list of 10000 lists, 1000 numbers each, it's going to be expensive.
1
List length: n
Element length: m 
Complexity: 
    Iterate on n
    Format key by iterating on m
    Check key exists is set `uniques` in O(1)
    Total running time is is O(n * m)

one simple way to do this is:

uniques = set()
output = []
for x in A:
    val = '-'.join([str(key) for key in x])
    if val not in uniques:
        output.append(x)
        uniques.add(val)
print (output)

output:

[['1', '2'], ['3', '4'], ['5', '6'], ['7', '8']]

2 Comments

Kindly mention the running cost as well. I think it has o(n). is it?
O(n) running time.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.