1

I m trying to delete all duplicates & original from a nested list based on specific column.

Example

list = [['abc',3232,'demo text'],['def',9834,'another text'],['abc',0988,'another another text'],['poi',1234,'text']]

The key column is the first (abc, def, abc) and based on this I want to remove any item (plus the original) which has the same value with the original.

So the new list should contain:

newlist = [['def',9834,'another text'],['poi',1234,'text']]

I found many similar topics but not for nested lists... Any help please?

2
  • What have you tried so far? Commented Jun 15, 2018 at 8:28
  • Side point. Never name a variable after a built-in, use L or list_ instead of list. Commented Jun 15, 2018 at 9:28

4 Answers 4

2

You can construct a list of keys

keys = [x[0] for x in list]

and select only those records for which the key occurs exactly once

newlist = [x for x in list if keys.count(x[0]) == 1]
Sign up to request clarification or add additional context in comments.

4 Comments

You have O(n^2) complexity here by calling list.count n times. You could use collections.Counter to make this O(n). Or store your counts separately.
Well, OP didn't say anything regarding the list size, so I assumed it is not large enough to make the difference between O(n) and O(n^2). Definitely, using Counter is more efficient approach but I intended to give a quick-and-dirty solution that works well in most cases.
My comment isn't a complaint, it's just a note which may interest readers.
No offense taken;) Just explained my intent.
1

Use collections.Counter:

from collections import Counter

lst = [['abc',3232,'demo text'],['def',9834,'another text'],['abc',988,'another another text'],['poi',1234,'text']]

d = dict(Counter(x[0] for x in lst))
print([x for x in lst if d[x[0]] == 1])

# [['def', 9834, 'another text'], 
#  ['poi', 1234, 'text']]

Also note that you shouldn't name your list as list as it shadows the built-in list.

2 Comments

Good solution, this has O(n) complexity. But I don't think if x[0] in d.keys() is necessary?
@jpp oops! That isn't necessary. Thanks a lot.
1

Using a list comprehension.

Demo:

l = [['abc',3232,'demo text'],['def',9834,'another text'],['abc', 988,'another another text'],['poi',1234,'text']]
checkVal = [i[0] for i in l]
print( [i for i in l if not checkVal.count(i[0]) > 1 ] )

Output:

[['def', 9834, 'another text'], ['poi', 1234, 'text']]

1 Comment

You have O(n^2) complexity here by calling list.count n times. You could use collections.Counter to make this O(n). Or store your counts separately.
1

Using collections.defaultdict for an O(n) solution:

L = [['abc',3232,'demo text'],
     ['def',9834,'another text'],
     ['abc',988,'another another text'],
     ['poi',1234,'text']]

from collections import defaultdict

d = defaultdict(list)

for key, num, txt in L:
    d[key].append([num, txt])

res = [[k, *v[0]] for k, v in d.items() if len(v) == 1]

print(res)

[['def', 9834, 'another text'],
 ['poi', 1234, 'text']]

1 Comment

Between this solution is also good. I usually go for Counter than a defaultdict way. +1.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.