11

I have a 2-d array

 xx=[[a,1],[b,2],[c,3]]

Now I'm trying to remove duplicate entries from it. For simple 1-D array, simple code like

xx=list(set(xx))

would work. But trying set on 2-d elements gives an error

temp = set(xx)
TypeError: unhashable type: 'list'

One workaround would be to serialize xx elements, and then do a list(set()) on new array and then unserialize all the elements back again.

Is there any solution in python?

4 Answers 4

30

Convert elements to tuple and then use set.

>>> xx=[['a',1],['b',2],['c',3],['c',3]]
>>> set(tuple(element) for element in xx)
set([('a', 1), ('b', 2), ('c', 3)])
>>> 

Tuples, unlike lists, can be hashed. Hence. And once you are done, convert the elements back to list. Putting everything together:

>>> [list(t) for t in set(tuple(element) for element in xx)]
[['a', 1], ['b', 2], ['c', 3]]
Sign up to request clarification or add additional context in comments.

11 Comments

somehow the code failed to remove the duplicate entries. set() is not able to detect duplicate tuples?
@Neo: This gets interesting. Can you post some sample values?
string and number of same value or nearly same floating point numbers maybe?
Breaking your code into "for" loop did the trick. for i in range(len(celeInfo)): celeInfo[i] = tuple(celeInfo[i]) celeInfo = list(set(celeInfo)). Pardon me for I'm new to scripting, is something missing from your code.
btw, How does one format code in comments? my previous comment looks ugly :P
|
3

One year after the excellent answer of Manoj Govindan, I'm adding my piece of advice:

Floating points numbers are just a pain if you want to compare things...

For instance,

>>>0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.1 == 0.1*10

False

That's because your computer can't accurately represent decimal floating points as binary numbers (computers handle binary/base 2 numbers only, not decimals/base 10).

So be really careful when comparing floats!

Comments

2

This is my solution, I've left the a[i][0] intentionally this way so you can change the member regarding your need.

ab= [['2.71.122.116', 'test_sys_-fw.test_sys_.us'],
     ['10.10.100.26', 'test_sys_5k1'],
     [None, 'Azure'],
     [None, 'test-server'],
     ['2.71.122.119', 'asa-5506-fw'],
     ['33.151.18.23', 'netscaler1'],
     ['33.151.18.23', 'netscaler2'],
     ['33.151.18.23', 'Palo Alto'],
     ['33.151.18.23', 'Arbor CP'],
     ['44.221.2.100', 'fw-la5515'],
     ['44.221.2.101', 'fw-la2-5515'],
     ['44.221.2.99', 'NexusLA2'],
     ['44.221.2.103', 'ASALA5510'],
     ['2.71.122.120', 'asa-5506-fw2'],
     ['2.71.122.106', '2928_SW2']]

def deduplicate_by_ip(a):
    """
    Clears Empty ip address records from list
    removes duplicates by
    :param a:
    :return:
    """

    source_ips = []
    new_list = []
    for i in range(len(a)):
        if a[i][0] != None:
            if a[i][0] not in source_ips:
                source_ips.append(a[i][0])
                new_list.append(a[i])
    return new_list

list = deduplicate_by_ip(ab)
print("Total items in original list :", len(ab))
print("Total items after deduplication :", len(list))
print("The list", list)

Comments

1

If the order doesn't matter, I believe the most concise way (for code-golf etc.) is to use the built-in map, list, and tuple with extended iterable unpacking and Python 3.5's additional unpacking generalizations:

x = [["a", 1], ["b", 2], ["c", 3], ["c", 3], ["a", 1]]

*y,=map(list,{*map(tuple,x)})

print(y)

Output:

[['a', 1], ['b', 2], ['c', 3]]

See it live

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.