1

I know that there are a lot of questions about duplicates but I can't find a solution suitable for me.

I have a json structure like this:

    {
    "test": [
        {
            "name2": [
                "Tik",
                "eev",
                "asdv",
                "asdfa",
                "sadf",
                "Nick"
            ]
        },
        {
            "name2": [
                "Tik",
                "eev",
                "123",
                "r45",
                "676",
                "121"
            ]
        }
    ]
}

I want to keep the first value and remove all the other duplicates.

Expected Result

    {
    "test": [
        {
            "name2": [
                "Tik",
                "eev",
                "asdv",
                "asdfa",
                "sadf",
                "Nick"
            ]
        },
        {
            "name2": [
                "123",
                "r45",
                "676",
                "121"
            ]
        }
    ]
  }

I tried using a tmp to check for duplicates but it didn't seem to work. Also I can't find a way to make it json again.

import json
with open('myjson') as access_json:
    read_data = json.load(access_json)

tmp = []
tmp2 = []
def get_synonyms():
    ingredients_access = read_data['test']
    for x in ingredients_access:
        for j in x['name2']:
            tmp.append(j)
            if j in tmp:
                tmp2.append(j)




get_synonyms()
print(len(tmp))
print(len(tmp2))
1
  • I can't delete them and the make the json file again. Commented Nov 5, 2019 at 13:54

3 Answers 3

2

You can use recursion:

def filter_d(d):
  seen = set()
  def inner(_d):
     if isinstance(_d, dict):
        return {a:inner(b) if isinstance(b, (dict, list)) else b for a, b in _d.items()}
     _r = []
     for i in _d:
       if isinstance(i, (dict, list)):
          _r.append(inner(i))
       elif i not in seen:
          _r.append(i)
          seen.add(i)
     return _r
  return inner(d)

import json
print(json.dumps(filter_d(data), indent=4))

Output:

{
  "test": [
    {
        "name2": [
            "Tik",
            "eev",
            "asdv",
            "asdfa",
            "sadf",
            "Nick"
        ]
    },
    {
        "name2": [
            "123",
            "r45",
            "676",
            "121"
        ]
     }
  ]
}
Sign up to request clarification or add additional context in comments.

2 Comments

@tobias_k Yep, just realized that. See my recent edit.
@NickStefanidis Glad to help!
1

You are first adding everything to tmp and then to tmp2 because every value was added to tmp before.

I changed the function a little bit to work for your specific test example:

def get_synonyms():
    test_list = []
    ingredients_access = read_data['test']
    used_values =[]
    for x in ingredients_access:
        inner_tmp = []
        for j in x['name2']:
            if j not in used_values:
                inner_tmp.append(j)
                used_values.append(j)
        test_list.append({'name2':inner_tmp})
    return {'test': test_list}


result = get_synonyms()
print(result)

Output:

{'test': [{'name2': ['Tik', 'eev', 'asdv', 'asdfa', 'sadf', 'Nick']}, {'name2': ['123', 'r45', '676', '121']}]}

Comments

1

Here's a little hackish answer:

d = {'test': [{'name2': ['Tik', 'eev', 'asdv', 'asdfa', 'sadf', 'Nick']},
              {'name2': ['Tik', 'eev', '123', 'r45', '676', '121']}]}
s = set()
for l in d['test']:
    l['name2'] = [(v, s.add(v))[0] for v in l['name2'] if v not in s]

Output:

{'test': [{'name2': ['Tik', 'eev', 'asdv', 'asdfa', 'sadf', 'Nick']},
          {'name2': ['123', 'r45', '676', '121']}]}

This uses a set to track the unique values, and add unique values to set while returning the value back to the list.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.