flatten nested dictionary with dictionary embedded in lists (functional python)

Question

This questions has been asked many times - but only once with this special case and I could partially find an answer here but it flattens down to every object.

I have this dictionary:

{'address': {'address_line_1': 'Floor Dekk House',
  'address_line_2': 'Zippora Street Providence Industrial Estate',
  'country': 'Seychelles',
  'locality': 'Mahe',
  'premises': '1st'},
 'address_snippet': '1st, Floor Dekk House, Zippora Street Providence Industrial Estate, Mahe, Seychelles',
 'appointment_count': 1,
 'description': 'Total number of appointments 1',
 'description_identifiers': ['appointment-count'],
 'kind': 'searchresults#officer',
 'links': {'self': '/officers/z7s5QUnhlYpAT8GvqvJ5snKmtHE/appointments'},
 'matches': {'snippet': [], 'title': [1, 8, 10, 11]},
 'snippet': '',
 'title': 'ASTROCOM AG '}

As you can see "description_identifiers" and "matches.snippet" and "matches.title" have a list as value. I'd like to edit my code below to flatten my dictionary so that the json is flattened in a{key:value, key:value, key:value}` pair - but if the value is a list of atomic objects (not a list of lists or a list of dictionaries), the value is maintained as a list.

The objective is so be able to upload then this json to postgresql.

Here's some code i found online:

def flatten_json(dictionary):
    """Flatten a nested json file"""

    def unpack(parent_key, parent_value):
        """Unpack one level of nesting in json file"""
        # Unpack one level only!!!

        if isinstance(parent_value, dict):
            for key, value in parent_value.items():
                temp1 = parent_key + '_' + key
                yield temp1, value
        elif isinstance(parent_value, list):
            i = 0 
            for value in parent_value:
                temp2 = parent_key + '_' +str(i) 
                i += 1
                yield temp2, value
        else:
            yield parent_key, parent_value    


    # Keep iterating until the termination condition is satisfied
    while True:
        # Keep unpacking the json file until all values are atomic elements (not dictionary or list)
        dictionary = dict(chain.from_iterable(starmap(unpack, dictionary.items())))
        # Terminate condition: not any value in the json file is dictionary or list
        if not any(isinstance(value, dict) for value in dictionary.values()) and \
           not any(isinstance(value, list) for value in dictionary.values()):
            break

    return dictionary

Desired output:

And to test, this dict: Should not be (which is what I get now):

{'address_address_line_1': 'Floor Dekk House',
 'address_address_line_2': 'Zippora Street Providence Industrial Estate',
 'address_country': 'Seychelles',
 'address_locality': 'Mahe',
 'address_premises': '1st',
 'address_snippet': '1st, Floor Dekk House, Zippora Street Providence Industrial Estate, Mahe, Seychelles',
 'appointment_count': 1,
 'description': 'Total number of appointments 1',
 'description_identifiers_0': 'appointment-count',
 'kind': 'searchresults#officer',
 'links_self': '/officers/z7s5QUnhlYpAT8GvqvJ5snKmtHE/appointments',
 'matches_title_0': 1,
 'matches_title_1': 8,
 'matches_title_2': 10,
 'matches_title_3': 11,
 'snippet': '',
 'title': 'ASTROCOM AG '}

But rather

{'address_address_line_1': 'Floor Dekk House',
 'address_address_line_2': 'Zippora Street Providence Industrial Estate',
 'address_country': 'Seychelles',
 'address_locality': 'Mahe',
 'address_premises': '1st',
 'address_snippet': '1st, Floor Dekk House, Zippora Street Providence Industrial Estate, Mahe, Seychelles',
 'appointment_count': 1,
 'description': 'Total number of appointments 1',
 'description_identifiers_0': 'appointment-count',
 'kind': 'searchresults#officer',
 'links_self': '/officers/z7s5QUnhlYpAT8GvqvJ5snKmtHE/appointments',
 'matches_title': [1, 8, 10, 11]
 'snippet': '',
 'title': 'ASTROCOM AG '}

Not sure where you got your example from (posting it helps a lot btw, lots of posters do not give input data), but is it supposed to act like a tree data-structure? . — Elmex80s
– Elmex80s, Commented Apr 4, 2019 at 9:03
Yes, it is. The JSON i posted is basically the fullest version of an API response. Some fields are optional however, so I am trying to get function to flatten any input and then use that flattened dict to insert the data in postgresql. The issue is that some lists contain integers/strings (in which case i want these list to remain lists and become arrays in postgresql) whereas if the list contains another dict, recursively unpack that too. Any solution so far? — Tytire Recubans
– Tytire Recubans, Commented Apr 4, 2019 at 9:50
I found this, but it doesn't handle the hedge case, lists are unpacked irrespectively of the data type they contain. — Tytire Recubans
– Tytire Recubans, Commented Apr 4, 2019 at 9:50
Not really, I'm sure it can be done with comprehension too - — Tytire Recubans
– Tytire Recubans, Commented Apr 4, 2019 at 10:07

adrtam · Accepted Answer · 2019-04-04 14:37:23Z

1

You are almost done, except you need a little more check on the condition:

def flatten(dict_, prefix):
    for k, v in dict_.items():
        if isinstance(v, list) and len(v)==1:
            if isinstance(v[0], dict):
                for key, value in flatten(v[0], prefix+k+"_"):
                    yield key, value
            else:
                yield prefix+k+"_0", v[0]
        elif isinstance(v, dict):
            for key, value in flatten(v, prefix+k+"_"):
                yield key, value
        else:
            yield prefix+k, v

Usage:

dict_ = {'address': {'address_line_1': 'Floor Dekk House',
  'address_line_2': 'Zippora Street Providence Industrial Estate',
  'country': 'Seychelles',
  'locality': 'Mahe',
  'premises': '1st'},
 'address_snippet': '1st, Floor Dekk House, Zippora Street Providence Industrial Estate, Mahe, Seychelles',
 'appointment_count': 1,
 'description': 'Total number of appointments 1',
 'description_identifiers': ['appointment-count'],
 'kind': 'searchresults#officer',
 'links': {'self': '/officers/z7s5QUnhlYpAT8GvqvJ5snKmtHE/appointments'},
 'matches': {'snippet': [], 'title': [1, 8, 10, 11]},
 'snippet': '',
 'title': 'ASTROCOM AG '}

import json
print(json.dumps(dict(list(flatten(dict_, ""))), indent=4))

Output:

{
    "address_address_line_1": "Floor Dekk House",
    "address_address_line_2": "Zippora Street Providence Industrial Estate",
    "address_country": "Seychelles",
    "address_locality": "Mahe",
    "address_premises": "1st",
    "address_snippet": "1st, Floor Dekk House, Zippora Street Providence Industrial Estate, Mahe, Seychelles",
    "appointment_count": 1,
    "description": "Total number of appointments 1",
    "description_identifiers_0": "appointment-count",
    "kind": "searchresults#officer",
    "links_self": "/officers/z7s5QUnhlYpAT8GvqvJ5snKmtHE/appointments",
    "matches_snippet": [],
    "matches_title": [
        1,
        8,
        10,
        11
    ],
    "snippet": "",
    "title": "ASTROCOM AG "
}

edited Apr 4, 2019 at 14:37

answered Apr 4, 2019 at 13:53

adrtam

7,2712 gold badges15 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Tytire Recubans Over a year ago

This is beautiful! I realised i had left a weird 0 after "description_identifiers" - so i removed the +"_0" from the second yield.

Tytire Recubans Over a year ago

any way we can get the output without those leading _ ?

Collectives™ on Stack Overflow

flatten nested dictionary with dictionary embedded in lists (functional python)

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related