-1

Original Post = Remove duplicates from json data

This is only my second post. I didnt have enough points to comment my question on the original post...So here I am.

Andy Hayden makes a great point - "Also, those aren't really duplicates... – Andy Hayden"

My question is just that situation... How can you remove duplicates from a json file but by matching against more than 1 key in the json file?

Here is the original example: (it was pointed out that it is not a valid json)

{
  {obj_id: 123,
    location: {
      x: 123,
      y: 323,
  },
  {obj_id: 13,
    location: {
      x: 23,
      y: 333,
  },
 {obj_id: 123,
    location: {
      x: 122,
      y: 133,
  },
}

My case is very similar to this example except In my case, it would keep all these because the x and y values of obj_id are unique, however if x and y were the same than one would be removed from json file.

All the examples I have found only kick out ones based on only one key match..

I don't know if it matters, but the keys that I need to match against are "Company Name" , "First Name", and "Last Name" (it is a 100k plus line json of companies and contacts - there are times when the same person is a contact of multiple companies which is why I need to match against multiple keys)

Thanks.

8
  • All the keys in a dictionary must be unique, so there's not way to for json.load() or json.loads() to return one that has has values with duplicate keys. It's one of the differences between Python dictionaries and JSON objects. Would getting a list of the objects be useful, because that might be possible. Commented Mar 22, 2018 at 19:06
  • @martineau That is true, but JSON allows arrays (or lists in Python), that can have duplicates... consider: json.loads("[1,2,3,1,1]") Commented Mar 22, 2018 at 19:10
  • @Joe: I know that...which why I asked whether getting the objects in a list (aka JSON array) would be acceptable to them in my comment. Commented Mar 22, 2018 at 19:14
  • @martineau Sorry but I am unsure what you mean... the OP gives an example of the input data, as a list or objects. Commented Mar 22, 2018 at 19:16
  • Joe:It was a dictionary of dictionaries when I posted my comment(s), but now @MushroomMauLa has changed it (which I don't think is what the OP intended, so I'm going to roll-back his/her changes). Commented Mar 22, 2018 at 20:16

1 Answer 1

1

I hope this does what you are looking for (It only checks if First and Last Name are different)

raw_data = [
        {
            "Company":123,
            "Person":{
                "First Name":123,
                "Last Name":323
            }
        },
        {
            "Company":13,
            "Person":{
                "First Name":123,
                "Last Name":323
            }
        },
        {
            "Company":123,
            "Person":{
                "First Name":122,
                "Last Name":133
            }
        }
    ]

unique = []
for company in raw_data:
    if all(unique_comp["Person"] != company["Person"] for unique_comp in unique):
        unique.append(company)

print(unique)

#>>> [{'Company': 123, 'Person': {'First Name': 123, 'Last Name': 323}}, {'Company': 123, 'Person': {'First Name': 122, 'Last Name': 133}}]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.