2

I have a json file contains an array of objects, the data inside the file is something like this:

[
 {‘name’: ‘A’,
 ‘address’: ‘some address related to A’,
 ‘details’: ‘some details related to A’},
 {‘name’: ‘B’,
 ‘address’: ‘some address related to A’,
 ‘details’: ‘some details related to B’},
 {‘name’: ‘C’,
 ‘address’: ‘some address related to A’,
 ‘details’: ‘some details related to C’}
]

and I want to remove redundant key value, so the output should be something like this:

  [
   {‘name’: ‘A’,
   ‘address’: ‘some address related to A’,
   ‘details’: ‘some details related to A’},
   {‘name’: ‘B’,
   ‘details’: ‘some details related to B’},
   {‘name’: ‘C’,
   ‘details’: ‘some details related to C’}
  ]

so, I've tried this code found it in this link:

import json

with open(‘./myfile.json’) as fp:
    data= fp.read()
  
unique = []
for n in data:
    if all(unique_data["address"] != data for unique_data["address"] in unique):
        unique.append(n)

#print(unique)   
with open(“./cleanedRedundancy.json”, ‘w’) as f:
     f.write(unique)

but it gives me this error:

TypeError: string indices must be integers
11
  • for n in data actually iterates through each symbol of text data, so each iteration n is one symbol of text. Is that what you really wanted? Commented Oct 7, 2020 at 13:37
  • You have to parse the JSON. See How to parse JSON in Python? Commented Oct 7, 2020 at 13:37
  • 1
    Also for unique_data["address"] in unique should be really for unique_data in unique. Commented Oct 7, 2020 at 13:37
  • @Arty, thanks for your reply, but can you please clarify more, I didn't really get what you said! Commented Oct 7, 2020 at 13:45
  • @n_dev Can you describe in more details algorithm of removing redunant entries? Then we can create a working code for implementing such algorithm. Commented Oct 7, 2020 at 13:49

1 Answer 1

2

I did solution with/without files support, without by default, for your case to support files change use_files = False to use_files = True inside my script.

I expected that you want to remove duplicates having same (key, value) pair.

Try it online!

import json

use_files = False
# Only duplicates with next keys will be deleted
only_keys = {'address', 'complex'}

if not use_files:
    fdata = """
    [
     {
       "name": "A",
       "address": "some address related to A",
       "details": "some details related to A"
     },
     {
       "name": "B",
       "address": "some address related to A",
       "details": "some details related to B",
       "complex": ["x", {"y": "z", "p": "q"}],
       "dont_remove": "test"
     },
     {
       "name": "C",
       "address": "some address related to A",
       "details": "some details related to C",
       "complex": ["x", {"p": "q", "y": "z"}],
       "dont_remove": "test"
     }
    ]
    """

if use_files:
    with open("./myfile.json", 'r', encoding = 'utf-8') as fp:
        data = fp.read()
else:
    data = fdata

entries = json.loads(data)

unique = set()
for e in entries:
    for k, v in list(e.items()):
        if k not in only_keys:
            continue
        v = json.dumps(v, sort_keys = True)
        if (k, v) in unique:
            del e[k]
        else:
            unique.add((k, v))

if use_files:
    with open("./cleanedRedundancy.json", "w", encoding = 'utf-8') as f:
        f.write(json.dumps(entries, indent = 4, ensure_ascii = False))
else:
    print(json.dumps(entries, indent = 4, ensure_ascii = False))

Output:

[
    {
        "name": "A",
        "address": "some address related to A",
        "details": "some details related to A"
    },
    {
        "name": "B",
        "details": "some details related to B",
        "complex": [
            "x",
            {
                "y": "z",
                "p": "q"
            }
        ],
        "dont_remove": "test"
    },
    {
        "name": "C",
        "details": "some details related to C",
        "dont_remove": "test"
    }
]
Sign up to request clarification or add additional context in comments.

10 Comments

I'd suggest a small change to avoid modifying entries while iterating. I know you created a copy of the sub-entry, but canonically we'd just create a new cleaned copy?
@KennyOstrom Why? I read-in then process then write-out this object only to do transform once for the given task. Why can't I delete entries? I also do in list(entries.items()), here list() makes a copy of dictionary items hence this loop iteration would not have troubles if entries change on flight.
@n_dev Fixed my answer! To support lists and any complex value types. Used nice trick by converting value to json string for comparison, for same value they should be equal strings.
@n_dev Also see that in my example nested dictionary in two places has different order of "y" and "p" keys, still I consider such dictionaries to be equal if they are equal for sorted order of keys, for that I used argument sort_keys = True, if such dictionaries should be considered un-equal, replace with sort_keys = False.
@n_dev In the beginning of script just added new constant only_keys = ..., put there keys only that need to be deleted. In your case just do only only_keys = {'address'} , this will solve your last requested task to delete only addresses.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.