Remove duplicate values in different Json Lists python

Question

I know that there are a lot of questions about duplicates but I can't find a solution suitable for me.

I have a json structure like this:

    {
    "test": [
        {
            "name2": [
                "Tik",
                "eev",
                "asdv",
                "asdfa",
                "sadf",
                "Nick"
            ]
        },
        {
            "name2": [
                "Tik",
                "eev",
                "123",
                "r45",
                "676",
                "121"
            ]
        }
    ]
}

I want to keep the first value and remove all the other duplicates.

Expected Result

    {
    "test": [
        {
            "name2": [
                "Tik",
                "eev",
                "asdv",
                "asdfa",
                "sadf",
                "Nick"
            ]
        },
        {
            "name2": [
                "123",
                "r45",
                "676",
                "121"
            ]
        }
    ]
  }

I tried using a tmp to check for duplicates but it didn't seem to work. Also I can't find a way to make it json again.

import json
with open('myjson') as access_json:
    read_data = json.load(access_json)

tmp = []
tmp2 = []
def get_synonyms():
    ingredients_access = read_data['test']
    for x in ingredients_access:
        for j in x['name2']:
            tmp.append(j)
            if j in tmp:
                tmp2.append(j)




get_synonyms()
print(len(tmp))
print(len(tmp2))

I can't delete them and the make the json file again.

Nick
– Nick

2019-11-05 13:54:18 +00:00
Commented Nov 5, 2019 at 13:54 — Nick
– Nick, Commented Nov 5, 2019 at 13:54

Ajax1234 · Accepted Answer · 2019-11-05 14:06:47Z

2

You can use recursion:

def filter_d(d):
  seen = set()
  def inner(_d):
     if isinstance(_d, dict):
        return {a:inner(b) if isinstance(b, (dict, list)) else b for a, b in _d.items()}
     _r = []
     for i in _d:
       if isinstance(i, (dict, list)):
          _r.append(inner(i))
       elif i not in seen:
          _r.append(i)
          seen.add(i)
     return _r
  return inner(d)

import json
print(json.dumps(filter_d(data), indent=4))

Output:

{
  "test": [
    {
        "name2": [
            "Tik",
            "eev",
            "asdv",
            "asdfa",
            "sadf",
            "Nick"
        ]
    },
    {
        "name2": [
            "123",
            "r45",
            "676",
            "121"
        ]
     }
  ]
}

edited Nov 5, 2019 at 14:06

answered Nov 5, 2019 at 13:58

Ajax1234

71.7k9 gold badges67 silver badges110 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Ajax1234 Over a year ago

@tobias_k Yep, just realized that. See my recent edit.

Ajax1234 Over a year ago

@NickStefanidis Glad to help!

bootica · Accepted Answer · 2019-11-05 14:00:53Z

1

You are first adding everything to tmp and then to tmp2 because every value was added to tmp before.

I changed the function a little bit to work for your specific test example:

def get_synonyms():
    test_list = []
    ingredients_access = read_data['test']
    used_values =[]
    for x in ingredients_access:
        inner_tmp = []
        for j in x['name2']:
            if j not in used_values:
                inner_tmp.append(j)
                used_values.append(j)
        test_list.append({'name2':inner_tmp})
    return {'test': test_list}


result = get_synonyms()
print(result)

Output:

{'test': [{'name2': ['Tik', 'eev', 'asdv', 'asdfa', 'sadf', 'Nick']}, {'name2': ['123', 'r45', '676', '121']}]}

edited Nov 5, 2019 at 14:00

answered Nov 5, 2019 at 13:54

bootica

7711 gold badge11 silver badges26 bronze badges

Comments

r.ook · Accepted Answer · 2019-11-05 14:06:00Z

1

Here's a little hackish answer:

d = {'test': [{'name2': ['Tik', 'eev', 'asdv', 'asdfa', 'sadf', 'Nick']},
              {'name2': ['Tik', 'eev', '123', 'r45', '676', '121']}]}
s = set()
for l in d['test']:
    l['name2'] = [(v, s.add(v))[0] for v in l['name2'] if v not in s]

Output:

{'test': [{'name2': ['Tik', 'eev', 'asdv', 'asdfa', 'sadf', 'Nick']},
          {'name2': ['123', 'r45', '676', '121']}]}

This uses a set to track the unique values, and add unique values to set while returning the value back to the list.

answered Nov 5, 2019 at 14:06

r.ook

13.9k2 gold badges26 silver badges41 bronze badges

Collectives™ on Stack Overflow

Remove duplicate values in different Json Lists python

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related