How to turn nested json or list inside dataframe?

Question

I have set of repsonse from Elasticsearch using aggregation query , and the response is like

'aggregations': {'group': {'doc_count_error_upper_bound': 0,
   'sum_other_doc_count': 0,
   'buckets': [{'key': 1365,
     'doc_count': 518,
     'group_docs': {'hits': {'total': {'value': 518, 'relation': 'eq'},
       'max_score': None,
       'hits': [{'_index': 'mdata',
         '_type': 'ter',
         '_id': 'n1X04XYBlaUrIoJskq9q',
         '_score': None,
         '_source': {'hId': 1365,
          'Id': 5348,
          'type': 'data'},
         'sort': [1610108665027]}]}}},
    {'key': 1372,
     'doc_count': 517,
     'group_docs': {'hits': {'total': {'value': 517, 'relation': 'eq'},
       'max_score': None,
       'hits': [{'_index': 'mdata',
         '_type': 'ter',
         '_id': 'qFUw4nYBlaUrIoJs6rdz',
         '_score': None,
         '_source': {'hId': 1372,
          'Id': 5348,
          'type': 'data'},
         'sort': [1610112617581]}]}}},
    {'key': 1392,
     'doc_count': 491,
     'group_docs': {'hits': {'total': {'value': 491, 'relation': 'eq'},
       'max_score': None,
       'hits': [{'_index': 'mdata',
         '_type': 'ter',
         '_id': '8VXR4XYBlaUrIoJsYKrS',
         '_score': None,
         '_source': {'hId': 1392,
          'Id': 5348,
          'type': 'data'},
         'sort': [1610106358393]}]}}}]},
  'bucketcount': {'count': 3,
   'min': 491.0,
   'max': 518.0,
   'avg': 508.6666666666667,
   'sum': 1526.0}}}

so i try to get the dataframe using

df= pd.json_normalize(result['aggregations']['group']['buckets'])

key doc_count   group_docs.hits.total.value group_docs.hits.total.relation  group_docs.hits.max_score   group_docs.hits.hits
0   1365    518 518 eq  None    [{'_index': 'mdata', '_type': 'ter', '_...
1   1372    517 517 eq  None    [{'_index': 'mdata', '_type': 'ter', '_...
2   1392    491 491 eq  None    [{'_index': 'mdata', '_type': 'ter', '_...

i have apply method in here enter link description here

using forreal = pd.DataFrame(result.get('group_docs.hits.hits')) didnt work for me with empty return

and

works_data = pd.json_normalize(df,record_path ='group_docs.hits.hits') returning Error "TypeError: string indices must be integers"

a Slow method i have try out is using

df= pd.json_normalize(result['aggregations']['group']['buckets'])
df_1 = (df.hits[0]['hits'])

and then append the Dataframe , however it;s slow for me as i have lot of DF to concat or append, i would like if there;s much better method to do ?

Rob Raymond · Accepted Answer · 2021-01-08 20:48:38Z

1

You don't specify what you are trying to achieve. The following will fully expand the sample JSON in your question

pd.json_normalize(
pd.json_normalize(results['aggregations']['group']['buckets']).explode("group_docs.hits.hits")
    .to_dict(orient="records")
).explode("group_docs.hits.hits.sort")

answered Jan 8, 2021 at 20:48

Rob Raymond

31.5k3 gold badges19 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Man Man Yu Over a year ago

sorry, i would like to flattern the whole json into a daraframe

Man Man Yu Over a year ago

thanks ! you solve my problem ~ Can you explain how it did ? since ii struggle for 1 hour in that ~ and i cannot find that in here pandas.pydata.org/pandas-docs/stable/reference/api/…

Rob Raymond Over a year ago

there are 3 nested lists, 1. aggregations.group.buckets 2. aggregations.group.buckets.group_docs.hits.hits 3. aggregations.group.buckets.group_docs.hits.hits.sort. 1. is dealt with by straight json_normalize(), 2. with explode() of 1, 3. convert 1&2 back to JSON, json_normalize() again to expand dicts then finally another explode() of sort

Collectives™ on Stack Overflow

How to turn nested json or list inside dataframe?

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related