1

I have a column in pandas data frame, which stores json. Below is the json format.

"{'kookooOutboundResponse': {'NewCall': {'event': 'NewCall', 'cid': 
   '09528005139', 'called_number': '914071326527', 'sid':
  '7919156078536741', 'outbound_sid': '7919156078536741', 'circle': 
  'UTTAR PRADESH (W) and UTTARAKHAND', 'operator': 'Reliance', 
  'cid_type': '91', 'cid_e164': '+919528005139', 'request_time': '2019-06-17 
  20:59:43', 'cid_country': '91', '__proto__': {}}, 'GotDTMF': {'event': 
  'GotDTMF', 'sid': '7919156078536741', 'data': '1', 'cid': '09528005139', 
  'called_number': '914071326527', 'request_time': '2019-06-17 21:00:27', 
  '__proto__': {}}, 'Hangup': {'event': 'Hangup', 'sid': '7919156078536741', 
  'process': 'none', 'total_call_duration': '47', 'cid': '09528005139', 
 'called_number': '914071326527', 'request_time': '2019-06-17 21:00:30', 
 '__proto__': {}}}}"

I need to flatten the json, so that I have all the keys as the column & store values in the respective column name.

1
  • It's not valid json format Commented Aug 27, 2019 at 9:52

1 Answer 1

1

Use list comprehension with concat and json.json_normalize:

data= "{'kookooOutboundResponse': {'NewCall': {'event': 'NewCall', 'cid': '09528005139', 'called_number': '914071326527', 'sid': '7919156078536741', 'outbound_sid': '7919156078536741', 'circle': 'UTTAR PRADESH (W) and UTTARAKHAND', 'operator': 'Reliance', 'cid_type': '91', 'cid_e164': '+919528005139', 'request_time': '2019-06-17 20:59:43', 'cid_country': '91', 'proto': {}}, 'GotDTMF': {'event': 'GotDTMF', 'sid': '7919156078536741', 'data': '1', 'cid': '09528005139', 'called_number': '914071326527', 'request_time': '2019-06-17 21:00:27', 'proto': {}}, 'Hangup': {'event': 'Hangup', 'sid': '7919156078536741', 'process': 'none', 'total_call_duration': '47', 'cid': '09528005139', 'called_number': '914071326527', 'request_time': '2019-06-17 21:00:30', 'proto': {}}}}"

import ast
from pandas.io.json import json_normalize

df = pd.DataFrame({'col':[data, data]})

L = [json_normalize(ast.literal_eval(x)['kookooOutboundResponse']) for x in df['col']]
df1 = pd.concat(L, ignore_index=True)
print (df1)
  NewCall.event  NewCall.cid NewCall.called_number       NewCall.sid  \
0       NewCall  09528005139          914071326527  7919156078536741   
1       NewCall  09528005139          914071326527  7919156078536741   

  NewCall.outbound_sid                     NewCall.circle NewCall.operator  \
0     7919156078536741  UTTAR PRADESH (W) and UTTARAKHAND         Reliance   
1     7919156078536741  UTTAR PRADESH (W) and UTTARAKHAND         Reliance   

  NewCall.cid_type NewCall.cid_e164 NewCall.request_time  ...  GotDTMF.cid  \
0               91    +919528005139  2019-06-17 20:59:43  ...  09528005139   
1               91    +919528005139  2019-06-17 20:59:43  ...  09528005139   

  GotDTMF.called_number GotDTMF.request_time Hangup.event        Hangup.sid  \
0          914071326527  2019-06-17 21:00:27       Hangup  7919156078536741   
1          914071326527  2019-06-17 21:00:27       Hangup  7919156078536741   

  Hangup.process Hangup.total_call_duration   Hangup.cid Hangup.called_number  \
0           none                         47  09528005139         914071326527   
1           none                         47  09528005139         914071326527   

   Hangup.request_time  
0  2019-06-17 21:00:30  
1  2019-06-17 21:00:30  

[2 rows x 24 columns]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.