Compare 1 column of 2D array and remove duplicates Python

Question

Say I have a 2D array like:

array = [['abc',2,3,],
        ['abc',2,3],
        ['bb',5,5],
        ['bb',4,6],
        ['sa',3,5],
        ['tt',2,1]]

I want to remove any rows where the first column duplicates
ie compare array[0] and return only:

removeDups = [['sa',3,5],
        ['tt',2,1]]

I think it should be something like: (set first col as tmp variable, compare tmp with remaining and #set array as returned from compare)

for x in range(len(array)):
    tmpCol = array[x][0] 
    del array[x] 
    removed = compare(array, tmpCol) 
    array = copy.deepcopy(removed) 

print repr(len(removed))  #testing

where compare is: (compare first col of each remaining array items with tmp, if match remove else return original array)

def compare(valid, tmpCol):
for x in range(len(valid)):
    if  valid[x][0] != tmpCol:
        del valid[x]
        return valid
    else:
        return valid

I keep getting 'index out of range' error. I've tried other ways of doing this, but I would really appreciate some help!

The 'index out of range error' is because you set up the for loop based on the initial length of the array, but you shorten it using the del statement. So, eventually you reach indices that are no longer there. You can use a while loop instead, but even then this code doesn't quite do what you want. — Ben Schmidt
– Ben Schmidt, Commented Jan 22, 2017 at 13:57

Ben Schmidt · Accepted Answer · 2017-01-22 14:09:28Z

1

Similar to other answers, but using a dictionary instead of importing counter:

counts = {}

for elem in array:
    # add 1 to counts for this string, creating new element at this key
    # with initial value of 0 if needed
    counts[elem[0]] = counts.get(elem[0], 0) + 1

new_array = []
for elem in array:
    # check that there's only 1 instance of this element.
    if counts[elem[0]] == 1:
        new_array.append(elem)

answered Jan 22, 2017 at 14:09

Ben Schmidt

4012 silver badges8 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

akuiper · Accepted Answer · 2017-01-22 14:01:31Z

1

One option you can try is create a counter for the first column of your array before hand and then filter the list based on the count value, i.e, keep the element only if the first element appears only once:

from collections import Counter

count = Counter(a[0] for a in array)
[a for a in array if count[a[0]] == 1]
# [['sa', 3, 5], ['tt', 2, 1]]

answered Jan 22, 2017 at 14:01

akuiper

216k33 gold badges362 silver badges379 bronze badges

Comments

MMF · Accepted Answer · 2017-01-22 14:02:31Z

0

You can use a dictionary and count the occurrences of each key. You can also use Counter from the library collections that actually does this.

Do as follows :

from collection import Counter

removed = []
for k, val1, val2 in array:
    if Counter([k for k, _, _ in array])[k]==1:
        removed.append([k, val1, val2])

answered Jan 22, 2017 at 14:02

MMF

5,9703 gold badges18 silver badges20 bronze badges

Collectives™ on Stack Overflow

Compare 1 column of 2D array and remove duplicates Python

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related