0

Say I have a 2D array like:

array = [['abc',2,3,],
        ['abc',2,3],
        ['bb',5,5],
        ['bb',4,6],
        ['sa',3,5],
        ['tt',2,1]]

I want to remove any rows where the first column duplicates
ie compare array[0] and return only:

removeDups = [['sa',3,5],
        ['tt',2,1]]

I think it should be something like: (set first col as tmp variable, compare tmp with remaining and #set array as returned from compare)

for x in range(len(array)):
    tmpCol = array[x][0] 
    del array[x] 
    removed = compare(array, tmpCol) 
    array = copy.deepcopy(removed) 

print repr(len(removed))  #testing 

where compare is: (compare first col of each remaining array items with tmp, if match remove else return original array)

def compare(valid, tmpCol):
for x in range(len(valid)):
    if  valid[x][0] != tmpCol:
        del valid[x]
        return valid
    else:
        return valid

I keep getting 'index out of range' error. I've tried other ways of doing this, but I would really appreciate some help!

1
  • The 'index out of range error' is because you set up the for loop based on the initial length of the array, but you shorten it using the del statement. So, eventually you reach indices that are no longer there. You can use a while loop instead, but even then this code doesn't quite do what you want. Commented Jan 22, 2017 at 13:57

3 Answers 3

1

Similar to other answers, but using a dictionary instead of importing counter:

counts = {}

for elem in array:
    # add 1 to counts for this string, creating new element at this key
    # with initial value of 0 if needed
    counts[elem[0]] = counts.get(elem[0], 0) + 1

new_array = []
for elem in array:
    # check that there's only 1 instance of this element.
    if counts[elem[0]] == 1:
        new_array.append(elem)
Sign up to request clarification or add additional context in comments.

Comments

1

One option you can try is create a counter for the first column of your array before hand and then filter the list based on the count value, i.e, keep the element only if the first element appears only once:

from collections import Counter

count = Counter(a[0] for a in array)
[a for a in array if count[a[0]] == 1]
# [['sa', 3, 5], ['tt', 2, 1]]

Comments

0

You can use a dictionary and count the occurrences of each key. You can also use Counter from the library collections that actually does this.

Do as follows :

from collection import Counter

removed = []
for k, val1, val2 in array:
    if Counter([k for k, _, _ in array])[k]==1:
        removed.append([k, val1, val2])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.