remove duplicates from nested dictionaries in list

Question

quick and very basic newbie question.

If i have list of dictionaries looking like this:

L = []
L.append({"value1": value1, "value2": value2, "value3": value3, "value4": value4})

Let's say there exists multiple entries where value3 and value4 are identical to other nested dictionaries. How can i quick and easy find and remove those duplicate dictionaries.

Preserving order is of no importance.

Thanks.

EDIT:

If there are five inputs, like this:

L = [{"value1": fssd, "value2": dsfds, "value3": abcd, "value4": gk},
    {"value1": asdasd, "value2": asdas, "value3": dafdd, "value4": sdfsdf},
    {"value1": sdfsf, "value2": sdfsdf, "value3": abcd, "value4": gk},
    {"value1": asddas, "value2": asdsa, "value3": abcd, "value4": gk},
    {"value1": asdasd, "value2": dskksks, "value3": ldlsld, "value4": sdlsld}]

The output shoud look like this:

L = [{"value1": fssd, "value2": dsfds, "value3": abcd, "value4": gk},
    {"value1": asdasd, "value2": asdas, "value3": dafdd, "value4": sdfsdf},
    {"value1": asdasd, "value2": dskksks, "value3": ldlsld, "value4": sdlsld}

To clarify, do you want to remove key/value pairs if there is a matching key/value pair in another dictionary, or if just the key (not necessarily the value) exists in another dictionary? — Kenan Banks
– Kenan Banks, Commented Aug 14, 2009 at 20:01
Is it just key3 and key4 that can't be identical? What happens if the value for one key matches the value for another key in another dict? Also, by the way, name your lists something other than list, or you'll overwrite the actual list in the built-in namespace, and you can't call the list() function later on. lst or list_ are fairly common alternatives. — Nikhil
– Nikhil, Commented Aug 14, 2009 at 20:05
Im simply using a dictionary inside a list because it's easier and more understandable then using a list inside a list, that way you can call l["value1"], but thats another story. — Jonas
– Jonas, Commented Aug 14, 2009 at 20:12
Now you have a list of lists each with one dictionary. Are you sure you want those extra set of [ ] around each dictionary?? — Ned Deily
– Ned Deily, Commented Aug 14, 2009 at 20:19

Thomas · Accepted Answer · 2021-12-21 19:30:01Z

7

In Python 2.6 or 3.*:

import itertools
import pprint

L = [{"value1": "fssd", "value2": "dsfds", "value3": "abcd", "value4": "gk"},
    {"value1": "asdasd", "value2": "asdas", "value3": "dafdd", "value4": "sdfsdf"},
    {"value1": "sdfsf", "value2": "sdfsdf", "value3": "abcd", "value4": "gk"},
    {"value1": "asddas", "value2": "asdsa", "value3": "abcd", "value4": "gk"},
    {"value1": "asdasd", "value2": "dskksks", "value3": "ldlsld", "value4": "sdlsld"}]

getvals = operator.itemgetter('value3', 'value4')

L.sort(key=getvals)

result = []
for k, g in itertools.groupby(L, getvals):
    result.append(next(g))

L[:] = result
pprint.pprint(L)

Almost the same in Python 2.5, except you have to use g.next() instead of next(g) in the append.

edited Dec 21, 2021 at 19:30

Thomas

43011 silver badges17 bronze badges

answered Aug 14, 2009 at 22:15

Alex Martelli

887k175 gold badges1.3k silver badges1.4k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

ars · Accepted Answer · 2009-08-14 21:03:26Z

7

Here's one way:

keyfunc = lambda d: (d['value3'], d['value4'])

from itertools import groupby
giter = groupby(sorted(L, key=keyfunc), keyfunc)

L2 = [g[1].next() for g in giter]
print L2

answered Aug 14, 2009 at 21:03

ars

124k23 gold badges151 silver badges135 bronze badges

3 Comments

hughdbrown Over a year ago

It looks like yours is correct and an hour earlier than Alex's.

ars Over a year ago

I guess it's easy to get missed once a question gets more than 5 or 6 answers. Probably helps to be in the first or last couple, I suspect. No biggie, but thanks for noting that. :)

lukik Over a year ago

running this in python3.3 and get the error AttributeError: 'itertools._grouper' object has no attribute 'next' any clue?

ACoolie · Accepted Answer · 2009-08-14 20:38:07Z

2

You can use a temporary array to store an items dict. The previous code was bugged for removing items in the for loop.

(v,r) = ([],[])
for i in l:
    if ('value4', i['value4']) not in v and ('value3', i['value3']) not in v:
        r.append(i)
    v.extend(i.items())
l = r

Your test:

l = [{"value1": 'fssd', "value2": 'dsfds', "value3": 'abcd', "value4": 'gk'},
    {"value1": 'asdasd', "value2": 'asdas', "value3": 'dafdd', "value4": 'sdfsdf'},
    {"value1": 'sdfsf', "value2": 'sdfsdf', "value3": 'abcd', "value4": 'gk'},
    {"value1": 'asddas', "value2": 'asdsa', "value3": 'abcd', "value4": 'gk'},
    {"value1": 'asdasd', "value2": 'dskksks', "value3": 'ldlsld', "value4": 'sdlsld'}]

ouputs

{'value4': 'gk', 'value3': 'abcd', 'value2': 'dsfds', 'value1': 'fssd'}
{'value4': 'sdfsdf', 'value3': 'dafdd', 'value2': 'asdas', 'value1': 'asdasd'}
{'value4': 'sdlsld', 'value3': 'ldlsld', 'value2': 'dskksks', 'value1': 'asdasd'}

edited Aug 14, 2009 at 20:38

answered Aug 14, 2009 at 20:06

ACoolie

1,4491 gold badge12 silver badges16 bronze badges

1 Comment

Jonas Over a year ago

Your output is not correct. Look at my example. Thanks anyhow for the attempt.

wallacer · Accepted Answer · 2009-08-14 21:11:09Z

1

for dic in list: 
  for anotherdic in list:
    if dic != anotherdic:
      if dic["value3"] == anotherdic["value3"] or dic["value4"] == anotherdic["value4"]:
        list.remove(anotherdic)

Tested with

list = [{"value1": 'fssd', "value2": 'dsfds', "value3": 'abcd', "value4": 'gk'},
{"value1": 'asdasd', "value2": 'asdas', "value3": 'dafdd', "value4": 'sdfsdf'},
{"value1": 'sdfsf', "value2": 'sdfsdf', "value3": 'abcd', "value4": 'gk'},
{"value1": 'asddas', "value2": 'asdsa', "value3": 'abcd', "value4": 'gk'},
{"value1": 'asdasd', "value2": 'dskksks', "value3": 'ldlsld', "value4": 'sdlsld'}]

worked fine for me :)

edited Aug 14, 2009 at 21:11

answered Aug 14, 2009 at 19:55

wallacer

13.3k3 gold badges29 silver badges46 bronze badges

Comments

Ned Deily · Accepted Answer · 2009-08-14 22:20:31Z

1

That's a list of one dictionary and but, assuming there are more dictionaries in the list l:

l = [ldict for ldict in l if ldict.get("value3") != value3 or ldict.get("value4") != value4]

But is that what you really want to do? Perhaps you need to refine your description.

BTW, don't use list as a name since it is the name of a Python built-in.

EDIT: Assuming you started with a list of dictionaries, rather than a list of lists of 1 dictionary each that should work with your example. It wouldn't work if either of the values were None, so better something like:

l = [ldict for ldict in l if not ( ("value3" in ldict and ldict["value3"] == value3) and ("value4" in ldict and ldict["value4"] == value4) )]

But it still seems like an unusual data structure.

EDIT: no need to use explicit gets.

Also, there are always tradeoffs in solutions. Without more info and without actually measuring, it's hard to know which performance tradeoffs are most important for the problem. But, as the Zen sez: "Simple is better than complex".

edited Aug 14, 2009 at 22:20

answered Aug 14, 2009 at 20:05

Ned Deily

85.4k17 gold badges134 silver badges156 bronze badges

1 Comment

Jonas Over a year ago

Hello Ned, thanks for your input, i have added an example on an INPUT and an OUTPUT of the same list, also, i have renamed the list, in that specific example. Thanks.

Anon · Accepted Answer · 2009-08-14 22:39:42Z

0

If I understand correctly, you want to discard matches that come later in the original list but do not care about the order of the resulting list, so:

(Tested with 2.5.2)

tempDict = {}
for d in L[::-1]:
    tempDict[(d["value3"],d["value4"])] = d
L[:] = tempDict.itervalues()
tempDict = None

edited Aug 14, 2009 at 22:39

answered Aug 14, 2009 at 21:40

Anon

12.7k3 gold badges26 silver badges19 bronze badges

4 Comments

hughdbrown Over a year ago

Did you try running your code? It doesn't do what the OP asked for. A couple of questions: (1) why iterate through the list in reverse order? (2) why use (d["value3"],d["value4"]) as the key in your temporary dictionary? (3) why assign the current dictionary in the list during iteration as the value to your temporary dicitonary?

Anon Over a year ago

Hrm - does what my interpretation was (which I was not sure about), and also matches his output - though not the order of it, but he said preserving that was of no importance. My interpretation: When more than one dictionary with the same (value3, value4) pair, keep only the first such dictionary from the original list. And, resulting list of dicts does not have to be in the same order. So... (1) so first intance in original list will "win" and be retained, (2) because I thought that's what had to be unique, and (3) because the dictionaries are the values I pull back out for the new list.

Anon Over a year ago

(In my test output, the dict items print in reverse order, and the list of dicts has them in a different order, but since he said "Preserving order is of no importance," that seemed within the parameters.)

Anon Over a year ago

Looking back over things, I stand by my interpretation. Order seems to be the only point of contention. Note that, if the OP's original data had, say, the instances of "abcd" replaced by "xkcd", the sort in Alex's answer (which rocks, as always) would also result in a different order. The question's random looking (and not even quoted) data gave no indication that its order was anything other than happenstance - again, particularly combined with "Preserving order is of no importance."

Collectives™ on Stack Overflow

remove duplicates from nested dictionaries in list

6 Answers 6

Comments

3 Comments

1 Comment

Comments

1 Comment

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Comments

3 Comments

1 Comment

Comments

1 Comment

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related