Structure JSON format to a specified data structure

Question

Basically I have a list

data_list = [
  '__att_names' : [
        ['id', 'name'], --> "__t_idx": 0
        ['location', 'address'] --> "__t_idx": 1
        ['random_key1', 'random_key2'] "__t_idx": 2
        ['random_key3', 'random_key4'] "__t_idx": 3
  ]
  "__root": {
      "comparables": [
            "__g_id": "153564396",
            "__atts": [
                1, --> This would be technically __att_names[0][1]
                'somerandomname',--> This would be technically __att_names[0][2]
                {
                    "__atts": [
                        'location_value', --> This would be technically __att_names[1][1]
                        'address_value',--> This would be technically __att_names[1][2]
                        "__atts": [
                        ]
                        "__t_idx": 1 --> It can keep getting nested.. further and further.
                    ]
                    "__t_idx": 1
                }
                {
                    "__atts": [
                        'random_key3value'
                        'random_key3value'
                    ]
                    "__t_idx": 3
                }
                {
                    "__atts": [
                        'random_key1value'
                        'random_key2value'
                    ]
                    "__t_idx": 2
                }
            ],
            "__t_idx": 0 ---> This maps to the first item in __att_names
    ]
  }
]

My desired output in this case would be

[
    {
        'id': 1,
        'name': 'somerandomname',
        'location': 'address_value',
        'random_key1': 'random_key1value',
        'random_key2': 'random_key2value',
        'random_key3': 'random_key3value',
        'random_key4': 'random_key4value',
    }
]

I was able to get it working for the first few nested fields for __att_names, but my code was getting really long and wonky when I was doing nested and it felt really repetitive. I feel like there is a neater and recursive way to solve this.

This is my current approach: As of now the following code does take care first the very first nested object..

payload_names =  data_list['__att_names']
comparable_data = data_list['__root']['comparables']
output_arr = []
for items in comparable_data[:1]:
        output = {}
        index_number = items.get('__t_idx')
        attributes = items.get('__atts')
        if attributes:
            recursive_function(index_number, attributes, payload_names, output)
        output_arr.append(output)


def recursive_function(index, attributes, payload_names, output):
    category_location = payload_names[index]
    for index, categories in enumerate(category_location):
        output[categories] = attributes[index]
        if type(attributes[index]) == dict:
            has_nested_index = attributes[index].get('__t_idx')
            has_nested_attributes = attributes[index].get('__atts')
            if has_nested_attributes and has_nested_index:
                recursive_function(has_nested_index, has_nested_attributes, payload_names, output)
            else:
                continue

To further explain given example:

[ {
            'id': 1,
            'name': 'somerandomname',
            'location': 'address_value',
            'random_key1': 'random_key1value',
            'random_key2': 'random_key2value',
            'random_key3': 'random_key3value',
            'random_key4': 'random_key4value',
        }
    ]

Specifically 'location': 'address_value', The value 'address_value' was derived from the array of comparables key which has the array of dictionaries with key value pair. i.e __g_id and __atts and also __t_idx note some of them might not have __g_id but when there is a key __atts there is also __t_idx which would map the index with array in __att_names

Overally __att_names are basically all the different keys and all the items within comparables -> __atts are basically the values for the key names in __att_names.

__t_idx helps us map __atts array items to __att_names and create a dictionary key-value as outcome.

Wrap your mapping code in a function, and call it recursively in the case if nested_attribute is true — alexis
– alexis, Commented Apr 21, 2021 at 5:38
Thank you for the suggestion, given your approach I attempted to create a recursive fucnction, however not all fields are properly mapped using _t_idx, can you validate the code for me? I updated the main thread. — HOOOPLA
– HOOOPLA, Commented Apr 21, 2021 at 5:56
I won't try to figure it out, sorry. Maybe somebody else will. But it's not clear to me how your payload definition leads you to get the key-value pair 'location': 'address_value', out of your inputs; maybe clearing that up will help. But keep at it, test, and single-step your code in a debugger so you can watch it (you do have access to a debugger, I hope!) — alexis
– alexis, Commented Apr 21, 2021 at 6:08
No problem, I've updated the main post and added more context hopefully it's more understandable. — HOOOPLA
– HOOOPLA, Commented Apr 21, 2021 at 6:18

Miguel · Accepted Answer · 2021-04-21 07:55:15Z

1

If you want to restructure a complex JSON object, my recommendation is to use jq.

The data you present is really confusing and ofuscated, so I'm not sure what exact filtering your case would require. But your problem involves indefinitely nested data, for what I understand. So instead of a recursive function, you could make a loop that unnests the data into the plain structure that you desire. There's already a question on that topic.

answered Apr 21, 2021 at 7:55

Miguel

2,2291 gold badge14 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

alexis Over a year ago

Thanks for the pointer to jq! I couldn't say if this tool is the best fit for the OP (I suspect that a bit of hands-on wrangling in python might benefit them more), but I am definitely adding it to my toolbox.

alexis Over a year ago

PS could you add a jq script that can handle the nested data in the question (as far as we can understand what the OP is after, of course)?

Miguel Over a year ago

I'm afraid I don't have enough experience with jq to elaborate such filter... I myself discovered a few time ago, and although I'm 100% that it can be a solution, I'm not able to write it right now. However, the structure would be the one a said: a loop or a recursive function that, while there is nested data, filters it and adds it to a separate array.

Ajax1234 · Accepted Answer · 2021-05-02 16:01:03Z

1

You can traverse the structure while tracking the __t_idx key values that correspond to list elements that are not dictionaries:

data_list = {'__att_names': [['id', 'name'], ['location', 'address'], ['random_key1', 'random_key2'], ['random_key3', 'random_key4']], '__root': {'comparables': [{'__g_id': '153564396', '__atts': [1, 'somerandomname', {'__atts': ['location_value', 'address_value', {'__atts': [], '__t_idx': 1}], '__t_idx': 1}, {'__atts': ['random_key3value', 'random_key4value'], '__t_idx': 3}, {'__atts': ['random_key1value', 'random_key2value'], '__t_idx': 2}], '__t_idx': 0}]}}
def get_vals(d, f = False, t_idx = None):
   if isinstance(d, dict) and '__atts' in d:
       yield from [i for a, b in d.items() for i in get_vals(b, t_idx = d.get('__t_idx'))]
   elif isinstance(d, list):
       yield from [i for b in d for i in get_vals(b, f = True, t_idx = t_idx)]
   elif f and t_idx is not None:
       yield (d, t_idx)

result = []
for i in data_list['__root']['comparables']:
    new_d = {}
    for a, b in get_vals(i):
       new_d[b] = iter([*new_d.get(b, []), a])
    result.append({j:next(new_d[i]) for i, a in enumerate(data_list['__att_names']) for j in a})

print(result)

Output:

[
   {'id': 1, 
    'name': 'somerandomname', 
    'location': 'location_value', 
    'address': 'address_value', 
    'random_key1': 'random_key1value', 
    'random_key2': 'random_key2value', 
    'random_key3': 'random_key3value', 
    'random_key4': 'random_key4value'
    }
]

edited May 2, 2021 at 16:01

answered Apr 21, 2021 at 15:15

Ajax1234

71.7k9 gold badges67 silver badges110 bronze badges

8 Comments

HOOOPLA Over a year ago

I was able to get it working, the dictionaries does seem like it's properly mapping however, it appears that this only gives me one dictionary back. I wanted to create a dictionary for EACH item within ['comparalales'] array.

HOOOPLA Over a year ago

for a, b in get_vals(data_list['__root']): I tried changing this to data_list['__root']['comparables'] no luck'

HOOOPLA Over a year ago

What I am saying is that, there could be a list of comparables, as of now it only seem to be doing it for the first item in comparable list.

HOOOPLA Over a year ago

After spending some time with the code snippet you sent me, I see one issue, the code takes care of properly assigning t_index to the list of items that are not dictionaries, but t_index, but also have a dictionary as value themselves, it just won't have __atts key name to it. I tried fixing it with the existing code you sent me, no luck. The issue right now is the names __att_names don't exactly match the data retrieved by get_vals, the length's are different, and that is because we skip those dictionaries with no __att_names key. @Ajax1234

Ajax1234 Over a year ago

@HOOOPLA To clarify, __t_idx can take a dictionary, as opposed to an int? Also, __att_names and __atts might not appear in a given dictionary?

|

Collectives™ on Stack Overflow

Structure JSON format to a specified data structure

2 Answers 2

3 Comments

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related