4

quick and very basic newbie question.

If i have list of dictionaries looking like this:

L = []
L.append({"value1": value1, "value2": value2, "value3": value3, "value4": value4})

Let's say there exists multiple entries where value3 and value4 are identical to other nested dictionaries. How can i quick and easy find and remove those duplicate dictionaries.

Preserving order is of no importance.

Thanks.

EDIT:

If there are five inputs, like this:

L = [{"value1": fssd, "value2": dsfds, "value3": abcd, "value4": gk},
    {"value1": asdasd, "value2": asdas, "value3": dafdd, "value4": sdfsdf},
    {"value1": sdfsf, "value2": sdfsdf, "value3": abcd, "value4": gk},
    {"value1": asddas, "value2": asdsa, "value3": abcd, "value4": gk},
    {"value1": asdasd, "value2": dskksks, "value3": ldlsld, "value4": sdlsld}]

The output shoud look like this:

L = [{"value1": fssd, "value2": dsfds, "value3": abcd, "value4": gk},
    {"value1": asdasd, "value2": asdas, "value3": dafdd, "value4": sdfsdf},
    {"value1": asdasd, "value2": dskksks, "value3": ldlsld, "value4": sdlsld}
5
  • To clarify, do you want to remove key/value pairs if there is a matching key/value pair in another dictionary, or if just the key (not necessarily the value) exists in another dictionary? Commented Aug 14, 2009 at 20:01
  • Is it just key3 and key4 that can't be identical? What happens if the value for one key matches the value for another key in another dict? Also, by the way, name your lists something other than list, or you'll overwrite the actual list in the built-in namespace, and you can't call the list() function later on. lst or list_ are fairly common alternatives. Commented Aug 14, 2009 at 20:05
  • Yes, just key3 and key4, the rest can be duplicates. Commented Aug 14, 2009 at 20:11
  • Im simply using a dictionary inside a list because it's easier and more understandable then using a list inside a list, that way you can call l["value1"], but thats another story. Commented Aug 14, 2009 at 20:12
  • Now you have a list of lists each with one dictionary. Are you sure you want those extra set of [ ] around each dictionary?? Commented Aug 14, 2009 at 20:19

6 Answers 6

7

In Python 2.6 or 3.*:

import itertools
import pprint

L = [{"value1": "fssd", "value2": "dsfds", "value3": "abcd", "value4": "gk"},
    {"value1": "asdasd", "value2": "asdas", "value3": "dafdd", "value4": "sdfsdf"},
    {"value1": "sdfsf", "value2": "sdfsdf", "value3": "abcd", "value4": "gk"},
    {"value1": "asddas", "value2": "asdsa", "value3": "abcd", "value4": "gk"},
    {"value1": "asdasd", "value2": "dskksks", "value3": "ldlsld", "value4": "sdlsld"}]

getvals = operator.itemgetter('value3', 'value4')

L.sort(key=getvals)

result = []
for k, g in itertools.groupby(L, getvals):
    result.append(next(g))

L[:] = result
pprint.pprint(L)

Almost the same in Python 2.5, except you have to use g.next() instead of next(g) in the append.

Sign up to request clarification or add additional context in comments.

Comments

7

Here's one way:

keyfunc = lambda d: (d['value3'], d['value4'])

from itertools import groupby
giter = groupby(sorted(L, key=keyfunc), keyfunc)

L2 = [g[1].next() for g in giter]
print L2

3 Comments

It looks like yours is correct and an hour earlier than Alex's.
I guess it's easy to get missed once a question gets more than 5 or 6 answers. Probably helps to be in the first or last couple, I suspect. No biggie, but thanks for noting that. :)
running this in python3.3 and get the error AttributeError: 'itertools._grouper' object has no attribute 'next' any clue?
2

You can use a temporary array to store an items dict. The previous code was bugged for removing items in the for loop.

(v,r) = ([],[])
for i in l:
    if ('value4', i['value4']) not in v and ('value3', i['value3']) not in v:
        r.append(i)
    v.extend(i.items())
l = r

Your test:

l = [{"value1": 'fssd', "value2": 'dsfds', "value3": 'abcd', "value4": 'gk'},
    {"value1": 'asdasd', "value2": 'asdas', "value3": 'dafdd', "value4": 'sdfsdf'},
    {"value1": 'sdfsf', "value2": 'sdfsdf', "value3": 'abcd', "value4": 'gk'},
    {"value1": 'asddas', "value2": 'asdsa', "value3": 'abcd', "value4": 'gk'},
    {"value1": 'asdasd', "value2": 'dskksks', "value3": 'ldlsld', "value4": 'sdlsld'}]

ouputs

{'value4': 'gk', 'value3': 'abcd', 'value2': 'dsfds', 'value1': 'fssd'}
{'value4': 'sdfsdf', 'value3': 'dafdd', 'value2': 'asdas', 'value1': 'asdasd'}
{'value4': 'sdlsld', 'value3': 'ldlsld', 'value2': 'dskksks', 'value1': 'asdasd'}

1 Comment

Your output is not correct. Look at my example. Thanks anyhow for the attempt.
1
for dic in list: 
  for anotherdic in list:
    if dic != anotherdic:
      if dic["value3"] == anotherdic["value3"] or dic["value4"] == anotherdic["value4"]:
        list.remove(anotherdic)

Tested with

list = [{"value1": 'fssd', "value2": 'dsfds', "value3": 'abcd', "value4": 'gk'},
{"value1": 'asdasd', "value2": 'asdas', "value3": 'dafdd', "value4": 'sdfsdf'},
{"value1": 'sdfsf', "value2": 'sdfsdf', "value3": 'abcd', "value4": 'gk'},
{"value1": 'asddas', "value2": 'asdsa', "value3": 'abcd', "value4": 'gk'},
{"value1": 'asdasd', "value2": 'dskksks', "value3": 'ldlsld', "value4": 'sdlsld'}]

worked fine for me :)

Comments

1

That's a list of one dictionary and but, assuming there are more dictionaries in the list l:

l = [ldict for ldict in l if ldict.get("value3") != value3 or ldict.get("value4") != value4]

But is that what you really want to do? Perhaps you need to refine your description.

BTW, don't use list as a name since it is the name of a Python built-in.

EDIT: Assuming you started with a list of dictionaries, rather than a list of lists of 1 dictionary each that should work with your example. It wouldn't work if either of the values were None, so better something like:

l = [ldict for ldict in l if not ( ("value3" in ldict and ldict["value3"] == value3) and ("value4" in ldict and ldict["value4"] == value4) )]

But it still seems like an unusual data structure.

EDIT: no need to use explicit gets.

Also, there are always tradeoffs in solutions. Without more info and without actually measuring, it's hard to know which performance tradeoffs are most important for the problem. But, as the Zen sez: "Simple is better than complex".

1 Comment

Hello Ned, thanks for your input, i have added an example on an INPUT and an OUTPUT of the same list, also, i have renamed the list, in that specific example. Thanks.
0

If I understand correctly, you want to discard matches that come later in the original list but do not care about the order of the resulting list, so:

(Tested with 2.5.2)

tempDict = {}
for d in L[::-1]:
    tempDict[(d["value3"],d["value4"])] = d
L[:] = tempDict.itervalues()
tempDict = None

4 Comments

Did you try running your code? It doesn't do what the OP asked for. A couple of questions: (1) why iterate through the list in reverse order? (2) why use (d["value3"],d["value4"]) as the key in your temporary dictionary? (3) why assign the current dictionary in the list during iteration as the value to your temporary dicitonary?
Hrm - does what my interpretation was (which I was not sure about), and also matches his output - though not the order of it, but he said preserving that was of no importance. My interpretation: When more than one dictionary with the same (value3, value4) pair, keep only the first such dictionary from the original list. And, resulting list of dicts does not have to be in the same order. So... (1) so first intance in original list will "win" and be retained, (2) because I thought that's what had to be unique, and (3) because the dictionaries are the values I pull back out for the new list.
(In my test output, the dict items print in reverse order, and the list of dicts has them in a different order, but since he said "Preserving order is of no importance," that seemed within the parameters.)
Looking back over things, I stand by my interpretation. Order seems to be the only point of contention. Note that, if the OP's original data had, say, the instances of "abcd" replaced by "xkcd", the sort in Alex's answer (which rocks, as always) would also result in a different order. The question's random looking (and not even quoted) data gave no indication that its order was anything other than happenstance - again, particularly combined with "Preserving order is of no importance."

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.