0

I have set of repsonse from Elasticsearch using aggregation query , and the response is like

'aggregations': {'group': {'doc_count_error_upper_bound': 0,
   'sum_other_doc_count': 0,
   'buckets': [{'key': 1365,
     'doc_count': 518,
     'group_docs': {'hits': {'total': {'value': 518, 'relation': 'eq'},
       'max_score': None,
       'hits': [{'_index': 'mdata',
         '_type': 'ter',
         '_id': 'n1X04XYBlaUrIoJskq9q',
         '_score': None,
         '_source': {'hId': 1365,
          'Id': 5348,
          'type': 'data'},
         'sort': [1610108665027]}]}}},
    {'key': 1372,
     'doc_count': 517,
     'group_docs': {'hits': {'total': {'value': 517, 'relation': 'eq'},
       'max_score': None,
       'hits': [{'_index': 'mdata',
         '_type': 'ter',
         '_id': 'qFUw4nYBlaUrIoJs6rdz',
         '_score': None,
         '_source': {'hId': 1372,
          'Id': 5348,
          'type': 'data'},
         'sort': [1610112617581]}]}}},
    {'key': 1392,
     'doc_count': 491,
     'group_docs': {'hits': {'total': {'value': 491, 'relation': 'eq'},
       'max_score': None,
       'hits': [{'_index': 'mdata',
         '_type': 'ter',
         '_id': '8VXR4XYBlaUrIoJsYKrS',
         '_score': None,
         '_source': {'hId': 1392,
          'Id': 5348,
          'type': 'data'},
         'sort': [1610106358393]}]}}}]},
  'bucketcount': {'count': 3,
   'min': 491.0,
   'max': 518.0,
   'avg': 508.6666666666667,
   'sum': 1526.0}}}

so i try to get the dataframe using

df= pd.json_normalize(result['aggregations']['group']['buckets'])

key doc_count   group_docs.hits.total.value group_docs.hits.total.relation  group_docs.hits.max_score   group_docs.hits.hits
0   1365    518 518 eq  None    [{'_index': 'mdata', '_type': 'ter', '_...
1   1372    517 517 eq  None    [{'_index': 'mdata', '_type': 'ter', '_...
2   1392    491 491 eq  None    [{'_index': 'mdata', '_type': 'ter', '_...

i have apply method in here enter link description here

using forreal = pd.DataFrame(result.get('group_docs.hits.hits')) didnt work for me with empty return

and

works_data = pd.json_normalize(df,record_path ='group_docs.hits.hits') returning Error "TypeError: string indices must be integers"

a Slow method i have try out is using

df= pd.json_normalize(result['aggregations']['group']['buckets'])
df_1 = (df.hits[0]['hits'])

and then append the Dataframe , however it;s slow for me as i have lot of DF to concat or append, i would like if there;s much better method to do ?

1 Answer 1

1

You don't specify what you are trying to achieve. The following will fully expand the sample JSON in your question

pd.json_normalize(
pd.json_normalize(results['aggregations']['group']['buckets']).explode("group_docs.hits.hits")
    .to_dict(orient="records")
).explode("group_docs.hits.hits.sort")
Sign up to request clarification or add additional context in comments.

3 Comments

sorry, i would like to flattern the whole json into a daraframe
thanks ! you solve my problem ~ Can you explain how it did ? since ii struggle for 1 hour in that ~ and i cannot find that in here pandas.pydata.org/pandas-docs/stable/reference/api/…
there are 3 nested lists, 1. aggregations.group.buckets 2. aggregations.group.buckets.group_docs.hits.hits 3. aggregations.group.buckets.group_docs.hits.hits.sort. 1. is dealt with by straight json_normalize(), 2. with explode() of 1, 3. convert 1&2 back to JSON, json_normalize() again to expand dicts then finally another explode() of sort

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.