I have some data from an API that I am trying to convert to a Pandas dataframe. I am struggling to extract the 'station_xyz__cr' id number from the list in a nested dict (where a list can be empty as in the middle dataset).
output = {'data': [{'abc_serial_number__c': 'ABC2020-07571',
'id': 'V48000000000F79',
'modified_date__v': '2020-06-15T05:13:14.000Z',
'name__v': 'VVV-001039',
'station_xyz__cr': {'data': [{'id': 'V5J000000000B86'}],
'responseDetails': {'limit': 250,
'offset': 0,
'size': 1,
'total': 1}}},
{'abc_serial_number__c': 'ABC2020-09952',
'id': 'V48000000001B94',
'modified_date__v': '2020-06-24T11:30:40.000Z',
'name__v': 'VVV-004040',
'station_xyz__cr': {'data': [],
'responseDetails': {'limit': 250,
'offset': 0,
'size': 1,
'total': 1}}},
{'abc_serial_number__c': 'ABC2020-09196',
'id': 'V48000000001B95',
'modified_date__v': '2020-06-23T09:38:18.000Z',
'name__v': 'VVV-004041',
'station_xyz__cr': {'data': [{'id': 'V5J000000000Z10'}],
'responseDetails': {'limit': 250,
'offset': 0,
'size': 1,
'total': 1}}}],
'responseDetails': {'limit': 1000, 'offset': 0, 'size': 3, 'total': 3},
'responseStatus': 'SUCCESS'}
I'm trying to get the nested id data into a column in the dataframe something like this:
station_xyz__cr.data.id
0 V5J000000000B86
1 None
2 V5J000000000Z10
I've tried converting to a dataframe with json_normalize (droppping the columns I don't need):
df = pd.json_normalize(output['data'])
df = df.loc[:, ~df.columns.str.startswith('station_xyz__cr.responseDetails')]
print(df)
abc_serial_number__c id modified_date__v name__v \
0 ABC2020-07571 V48000000000F79 2020-06-15T05:13:14.000Z VVV-001039
1 ABC2020-09952 V48000000001B94 2020-06-24T11:30:40.000Z VVV-004040
2 ABC2020-09196 V48000000001B95 2020-06-23T09:38:18.000Z VVV-004041
station_xyz__cr.data
0 [{'id': 'V5J000000000B86'}]
1 []
2 [{'id': 'V5J000000000Z10'}]
but Im stuggling to convert the 'station_xyz__cr.data' list of dicts to simple dataframe of the ids:
df2 = pd.DataFrame(df['station_xyz__cr.data'].tolist(), index= df.index)
df2 = df2.rename(columns = {0:'station_xyz__cr.data'})
df2
station_xyz__cr.data
0 {'id': 'V5J000000000B86'}
1 None
2 {'id': 'V5J000000000Z10'}
The 'None' is causing me problems when I tried to extract further. I tried replacing the None - but I could only replace with 0:
df.fillna(0, inplace=True)
df2['station_xyz__cr.data']=df2['station_xyz__cr.data'].map(lambda x : x[0]['id'] if x else None)