0

Forgive me if this question seems trivial but I have an array of strings in a for loop that's been sorted. There are some repeat items in the array that I want to remove. I'm fairly new to Python so I don't know if there's a library that lets me remove repeat items in an array. Here's what I'm doing to remove the repeat items:

for i in teams:
        if teams[i+1] is teams[i]:
                teams.remove(teams[i])

Now the if-statement in that would've worked just fine in C++, C#, and Java but for some reason it's returning an error "cannot concatenate 'str' and 'int' objects".

2
  • After you fix the index errors, be aware that mutating an object while you're iterating over it is quite unlikely to do what you hope. Commented Nov 3, 2013 at 2:32
  • If teams is a list of strings, I don't think is will necessarily do what you want either. Use == to test strings for equality; is tests identity. Commented Nov 3, 2013 at 2:33

4 Answers 4

5

i is the item from teams. It's not an index. (Hint: when debugging this kind of problem, stick a print(i) inside the loop to make sure it's what you think it is.)

Now even after taking that into consideration, and rewriting your code to use a real index via either enumerate() or range(), you may still have some trouble, because you're removing items from the list while you're iterating over it. This will cause you to skip over some of them, because for is using an index internally and adds 1 to it each time through the loop. So deleting the current item moves the next-higher item into its place, then the index is incremented and the next one after that is now considered.

The most straightforward solution to that problem is to create a new list that contains only the elements you want to keep:

newteams = []
for team in teams:
    if not (newteams and newteams[-1] == team):
        newteams.append(team)

Basically, this will add a new item to newteams only if 1) newteams is empty or 2) the last item of newteams doesn't match the current team. Result: runs of duplicates of any length are reduced to a single item. If this needs to modify the list teams in place, then use a slice assignment afterward:

teams[:] = newteams

Another approach is to use a set to keep track of items we've already seen. (We use a set because it's fast to check to see whether something is in it.) Then we can simply omit the items we've already seen anywhere in the list -- with the previous approach, the list would need to be sorted for that to happen.

seen = set()
newteams = []
for team in teams:
    if team not in seen:
        newteams.append(team)
    else:
        seen.add(team)

With a little abuse of Python, one can condense this to the following (though you probably shouldn't, especially as a newcomer to the language):

seen = set()
teams[:] = (seen.add(team) or team for team in teams if team not in seen)

Of course if you don't care about the order (or are willing to sort the list afterward) @RMcG`s solution of converting to a set and back is even simpler.

Sign up to request clarification or add additional context in comments.

Comments

2

If you just want to remove duplicate strings in a list you can use a set. Convert the list to a set, convert it back to a list and then sort:

teams = ['big','small','big','foo','bar','bar','foo']
teams = sorted(list(set(teams)))

In [12]: teams
Out[12]: ['bar', 'big', 'foo', 'small']

The set doesn't allow for duplicates, it deals with the removing them for you. Also you are now sorting after duplicates have been removed instead of before, which should be more efficient.

2 Comments

You could also write sorted(set(teams)).
@DSM Thank you, that is so much better =D
1

groupby is a handy solution for this

from itertools import groupby
newteams = [k for k,g in groupby(teams)]

1 Comment

Nice, wouldn't have considered groupby.
0

This is what you might have intended.

for i in range(len(teams)):
    if teams[i+1] == teams[i]:
            teams.remove(teams[i])

You shouldn't use teams[i+1] is teams[i] because is compares identity of the 2 objects being compared. Equality must be compared only by ==. Also, instead of teams.remove(teams[i]) you must do del teams[i] since you are repeating the indexing while being aware of the index.

i in items would give you the elements in the items & not the index.

>>>teams = ['team1', 'team2', 'team3']
>>>for team in teams:
...    print team
team1
team2
team3

While

>>>teams = ['team1', 'team2', 'team3']
>>>for i in range(3):
...    print teams[i]
team1
team2
team3

You could also use builtin enumerate function

teams = ['team1', 'team2', 'team3']
for index, team in enumerate(teams):
    print index, "-->", team

Output of above

0 --> team1
1 --> team2
2 --> team3

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.