Delete the duplicate arrays inside an array in python

Question

I want to ignore the duplicates from an array that has multiple array in lowest running cost. For example;

A = [['1','2'],['3','4'],['5','6'],['1','2'],['3','4'],['7','8']]

the expected output should be like as

Output = [['1','2'],['3','4'],['5','6'],['7','8']]

Is it possible to compare arrays inside an array. I am doing in this way;

 A = [['1','2'],['3','4'],['5','6'],['1','2'],['3','4'],['7','8']]
        output = set()
        for x in A:
            output.add(x)
        print (output)

But it prompts;

TypeError: unhashable type: 'list'

You can't put a list in a set because a list is not hashable. And even if that worked, you'll lose the ordering of the items. — Moses Koledoye
– Moses Koledoye, Commented Feb 23, 2017 at 17:23

cdlane · Accepted Answer · 2017-02-23 18:39:19Z

2

How about something simple like:

B = list(map(list, set(map(tuple, A))))

Here's my "bakeoff" -- please let me know if I've misrepresented your solution:

import timeit
from random import choice

DIGITS = list("123456789")

# one million elements in list
A = [[choice(DIGITS), choice(DIGITS)] for _ in range(1000000)]

def elena(A):  # MrName's solution is identical
    B = []

    for i in A:
        if i not in B:
            B.append(i)
    return B

def cdlane(A):

    return list(map(list, set(map(tuple, A))))

def VikashSingh(A):
    uniques = set()
    B = []

    for x in A:
        val = '-'.join([str(key) for key in x])
        if val not in uniques:
            B.append(x)
            uniques.add(val)
    return B

def AbhilekhSingh(A):
    def unique_elements(l):
        last = object()
        for item in l:
            if item == last:
                continue
            yield item
            last = item

    return list(unique_elements(sorted(A)))

# sanity check to make sure everyone one agrees on the answer
B = sorted(elena(A))
assert(B == sorted(cdlane(A)))
assert(B == sorted(VikashSingh(A)))
assert(B == sorted(AbhilekhSingh(A)))

print("elena:", format(timeit.timeit('B = elena(A)', number=10, globals=globals()), ".3"))

print("cdlane:", format(timeit.timeit('B = cdlane(A)', number=10, globals=globals()), ".3"))

print("VikashSingh:", format(timeit.timeit('B = VikashSingh(A)', number=10, globals=globals()), ".3"))

print("AbhilekhSingh:", format(timeit.timeit('B = AbhilekhSingh(A)', number=10, globals=globals()), ".3"))

RESULTS

elena: 17.5
cdlane: 2.04
VikashSingh: 10.0
AbhilekhSingh: 8.83

edited Feb 23, 2017 at 18:39

answered Feb 23, 2017 at 17:26

cdlane

42.2k5 gold badges37 silver badges87 bronze badges

Sign up to request clarification or add additional context in comments.

14 Comments

Awais Over a year ago

Kindly mention the running cost as well.

Spherical Cowboy Over a year ago

This will change the order of the list elements, though, which might or might not be a problem.

Awais Over a year ago

What is the running cost?

cdlane Over a year ago

@sphericalcowboy, that was already addressed by the OP in the comments to his post.

Vikash Singh Over a year ago

I don't think log(n) is possible. Because if log(n) was possible we could reverse engineer it and use it for sorting of list. which we know can not be log(n).

|

elena · Accepted Answer · 2017-02-23 17:27:10Z

1

Here's a simple solution:

In [27]: A = [['1','2'],['3','4'],['5','6'],['1','2'],['3','4'], ['7','8']]

In [28]: new_list = []

In [29]: for i in A:
    ...:     if i not in new_list:
    ...:         new_list.append(i)
    ...:         

In [30]: new_list
Out[30]: [['1', '2'], ['3', '4'], ['5', '6'], ['7', '8']]

answered Feb 23, 2017 at 17:27

elena

4,2586 gold badges33 silver badges49 bronze badges

2 Comments

Awais Over a year ago

Kindly mention the running cost as well. I think it has o(n). is it?

Vikash Singh Over a year ago

if i not in new_list: won't this be a costly step and make the process n^2?

Abhilekh Singh · Accepted Answer · 2017-02-23 17:30:27Z

1

You can sort the list and compare every element with it's previous one.

List length: n
Element length: m 
Complexity: Sorting(n * log(n) * m) + Comparison(n * m) = Total(n * log(n) * m)

Try this:

def unique_elements(l):
    last = object()
    for item in l:
        if item == last:
            continue
        yield item
        last = item

def remove_duplicates(l):
    return list(unique_elements(sorted(l)))

edited Feb 23, 2017 at 17:30

answered Feb 23, 2017 at 17:27

Abhilekh Singh

2,9832 gold badges20 silver badges25 bronze badges

3 Comments

Awais Over a year ago

Kindly mention the running cost as well. I think it has o(n). is it?

Abhilekh Singh Over a year ago

Added the running cost and let me know of you have any questions regarding it.

Vikash Singh Over a year ago

your solution on 1Million list is giving CPU times: user 1.21 s, sys: 17.9 ms, total: 1.23 s

MrName · Accepted Answer · 2017-02-23 17:38:57Z

1

Another potentially simple solution, but not sure how the "cost" would compare to other presented solutions:

A = [['1','2'],['3','4'],['5','6'],['1','2'],['3','4'],['7','8']]

res = []
for entry in A:
    if not entry in res:
        res.append(entry)

answered Feb 23, 2017 at 17:38

MrName

2,53918 silver badges32 bronze badges

1 Comment

9000 Over a year ago

For the input data provided, this solution is quite adequate, order-preserving, and as straightforward as it gets. For a list of 10000 lists, 1000 numbers each, it's going to be expensive.

Vikash Singh · Accepted Answer · 2017-02-28 10:43:25Z

1

List length: n
Element length: m 
Complexity: 
    Iterate on n
    Format key by iterating on m
    Check key exists is set `uniques` in O(1)
    Total running time is is O(n * m)

one simple way to do this is:

uniques = set()
output = []
for x in A:
    val = '-'.join([str(key) for key in x])
    if val not in uniques:
        output.append(x)
        uniques.add(val)
print (output)

output:

[['1', '2'], ['3', '4'], ['5', '6'], ['7', '8']]

edited Feb 28, 2017 at 10:43

answered Feb 23, 2017 at 17:25

Vikash Singh

14.1k9 gold badges44 silver badges73 bronze badges

2 Comments

Awais Over a year ago

Kindly mention the running cost as well. I think it has o(n). is it?

Vikash Singh Over a year ago

O(n) running time.

Collectives™ on Stack Overflow

Delete the duplicate arrays inside an array in python

5 Answers 5

14 Comments

2 Comments

3 Comments

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

14 Comments

2 Comments

3 Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related