1

What I want to do is to remove string elements from my list that have some duplicate parts. For example, if I have given list.

ls = ['02/27/1960', '07/21/2004', '08/13/2004', '09/12/2004', '02/27', '07/21', '08/13']

I want output as

ls_out = ['02/27/1960', '07/21/2004', '08/13/2004', '09/12/2004']

That is '02/27' already existed in '02/27/1960'.

(note that I'm not sure if this question is duplicated or not)

2
  • Do you only need this to work with dates in that format, or arbitrary strings? Commented Jun 23, 2016 at 1:57
  • Hello @Max Feng, right now, I would like to do in this given format. Thanks! Commented Jun 23, 2016 at 1:59

3 Answers 3

3

This can also be solve with a for loop and any built-in method:

>>> ls
['02/27/1960', '07/21/2004', '08/13/2004', '09/12/2004', '02/27', '07/21', '08/13']
>>>
>>> ls_out = []
>>> 
>>> for x in ls:
        if not any([x in item for item in ls_out]):
            ls_out.append(x)


>>> ls_out
['02/27/1960', '07/21/2004', '08/13/2004', '09/12/2004']

OR:

>>> for x in ls:
        if all([x not in item for item in ls_out]):
            ls_out.append(x)


>>> ls_out
['02/27/1960', '07/21/2004', '08/13/2004', '09/12/2004']
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks so much @Iron Fist, this works really well for me!
@titipat .. no pblm .. ;)
Generator comprehension rather than list comprehension with the any approach seems like it would be the most efficient. Also, to me it seems like checking x not in ls_out is overkill. Better to just use: if not any(x in item for item in ls_out):
@juanpa.arrivillaga...correct...it seems redundant as the second condition is a factor for both ...Good observation .
1

I'm not sure if this is the most efficient way to do this, but it would definitely work:

ls = ['02/27/1960', '07/21/2004', '08/13/2004', '09/12/2004', '02/27', '07/21', '08/13']

ls2 = ls

for item in ls:
  for dup_item in ls2:
    if item == dup_item:
      continue
    if item.startswith(dup_item):
      _ = ls.pop(ls.index(dup_item))

print ls

Basically, it creates two identical lists, loops through both and checks if they're equal - if they are, it skips. If they aren't, it checks if they start with the other one. If it does, it removes it.

Comments

1
cache = set()
def fun(s):
    ss = s.split('/')
    key = ss[0] + '/' + ss[1]
    if key in cache:
        return None
    else:
        cache.add(key)
        return s

ls = ['02/27/1960', '07/21/2004', '08/13/2004', '09/12/2004', '02/27', '07/21', '08/13']

new_ls = filter(fun, ls)
print new_ls

4 Comments

It makes more sense to use a set as a cache rather than a dictionary with a useless mapping.
First, thanks @atline! @juanpa.arrivillaga I see. So, I have to change cache to empty set and check if key is in cache, I guess.
How about above? Seems set really better than hash.
Yeah, I did use solution above. However, thanks so much @atline :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.