remove redundant key value from json array python

Question

I have a json file contains an array of objects, the data inside the file is something like this:

[
 {‘name’: ‘A’,
 ‘address’: ‘some address related to A’,
 ‘details’: ‘some details related to A’},
 {‘name’: ‘B’,
 ‘address’: ‘some address related to A’,
 ‘details’: ‘some details related to B’},
 {‘name’: ‘C’,
 ‘address’: ‘some address related to A’,
 ‘details’: ‘some details related to C’}
]

and I want to remove redundant key value, so the output should be something like this:

  [
   {‘name’: ‘A’,
   ‘address’: ‘some address related to A’,
   ‘details’: ‘some details related to A’},
   {‘name’: ‘B’,
   ‘details’: ‘some details related to B’},
   {‘name’: ‘C’,
   ‘details’: ‘some details related to C’}
  ]

so, I've tried this code found it in this link:

import json

with open(‘./myfile.json’) as fp:
    data= fp.read()
  
unique = []
for n in data:
    if all(unique_data["address"] != data for unique_data["address"] in unique):
        unique.append(n)

#print(unique)   
with open(“./cleanedRedundancy.json”, ‘w’) as f:
     f.write(unique)

but it gives me this error:

TypeError: string indices must be integers

for n in data actually iterates through each symbol of text data, so each iteration n is one symbol of text. Is that what you really wanted? — Arty
– Arty, Commented Oct 7, 2020 at 13:37
You have to parse the JSON. See How to parse JSON in Python? — Felix Kling
– Felix Kling, Commented Oct 7, 2020 at 13:37
Also for unique_data["address"] in unique should be really for unique_data in unique. — Arty
– Arty, Commented Oct 7, 2020 at 13:37
@Arty, thanks for your reply, but can you please clarify more, I didn't really get what you said! — n_dev
– n_dev, Commented Oct 7, 2020 at 13:45
@n_dev Can you describe in more details algorithm of removing redunant entries? Then we can create a working code for implementing such algorithm. — Arty
– Arty, Commented Oct 7, 2020 at 13:49

Arty · Accepted Answer · 2020-10-08 04:10:59Z

2

I did solution with/without files support, without by default, for your case to support files change use_files = False to use_files = True inside my script.

I expected that you want to remove duplicates having same (key, value) pair.

Try it online!

import json

use_files = False
# Only duplicates with next keys will be deleted
only_keys = {'address', 'complex'}

if not use_files:
    fdata = """
    [
     {
       "name": "A",
       "address": "some address related to A",
       "details": "some details related to A"
     },
     {
       "name": "B",
       "address": "some address related to A",
       "details": "some details related to B",
       "complex": ["x", {"y": "z", "p": "q"}],
       "dont_remove": "test"
     },
     {
       "name": "C",
       "address": "some address related to A",
       "details": "some details related to C",
       "complex": ["x", {"p": "q", "y": "z"}],
       "dont_remove": "test"
     }
    ]
    """

if use_files:
    with open("./myfile.json", 'r', encoding = 'utf-8') as fp:
        data = fp.read()
else:
    data = fdata

entries = json.loads(data)

unique = set()
for e in entries:
    for k, v in list(e.items()):
        if k not in only_keys:
            continue
        v = json.dumps(v, sort_keys = True)
        if (k, v) in unique:
            del e[k]
        else:
            unique.add((k, v))

if use_files:
    with open("./cleanedRedundancy.json", "w", encoding = 'utf-8') as f:
        f.write(json.dumps(entries, indent = 4, ensure_ascii = False))
else:
    print(json.dumps(entries, indent = 4, ensure_ascii = False))

Output:

[
    {
        "name": "A",
        "address": "some address related to A",
        "details": "some details related to A"
    },
    {
        "name": "B",
        "details": "some details related to B",
        "complex": [
            "x",
            {
                "y": "z",
                "p": "q"
            }
        ],
        "dont_remove": "test"
    },
    {
        "name": "C",
        "details": "some details related to C",
        "dont_remove": "test"
    }
]

edited Oct 8, 2020 at 4:10

answered Oct 7, 2020 at 14:01

Arty

17k6 gold badges60 silver badges96 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

Kenny Ostrom Over a year ago

I'd suggest a small change to avoid modifying entries while iterating. I know you created a copy of the sub-entry, but canonically we'd just create a new cleaned copy?

Arty Over a year ago

@KennyOstrom Why? I read-in then process then write-out this object only to do transform once for the given task. Why can't I delete entries? I also do in list(entries.items()), here list() makes a copy of dictionary items hence this loop iteration would not have troubles if entries change on flight.

Arty Over a year ago

@n_dev Fixed my answer! To support lists and any complex value types. Used nice trick by converting value to json string for comparison, for same value they should be equal strings.

Arty Over a year ago

@n_dev Also see that in my example nested dictionary in two places has different order of "y" and "p" keys, still I consider such dictionaries to be equal if they are equal for sorted order of keys, for that I used argument sort_keys = True, if such dictionaries should be considered un-equal, replace with sort_keys = False.

Arty Over a year ago

@n_dev In the beginning of script just added new constant only_keys = ..., put there keys only that need to be deleted. In your case just do only only_keys = {'address'} , this will solve your last requested task to delete only addresses.

|

Collectives™ on Stack Overflow

remove redundant key value from json array python

1 Answer 1

10 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

10 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related