1

I have a column "data" which has json object as values. I would like to split them up.

source = {'_id':['SE-DATA-BB3A','SE-DATA-BB3E','SE-DATA-BB3F'],  'pi':['BB3A_CAP_BMLS','BB3E_CAP_BMLS','BB3F_CAP_PAS'], 'Datetime':['190725-122500', '190725-122500', '190725-122500'], 'data': [ {'bb3a_bmls':[{'name': 'WAG 01', 'id': '105F', 'state': 'available', 'nodes': 3,'volumes-': [{'state': 'available', 'id': '330172', 'name': 'q_-4144d4e'}, {'state': 'available', 'id': '275192', 'name': 'p_3089d821ae', }]}]}
, {'bb3b_bmls':[{'name': 'FEC 01', 'id': '382E', 'state': 'available', 'nodes': 4,'volumes': [{'state': 'unavailable', 'id': '830172', 'name': 'w_-4144d4e'}, {'state': 'unavailable', 'id': '223192', 'name': 'g_3089d821ae', }]}]}
, {'bb3c_bmls':[{'name': 'ASD 01', 'id': '303F', 'state': 'available', 'nodes': 6,'volumes': [{'state': 'unavailable', 'id': '930172', 'name': 'e_-4144d4e'}, {'state': 'unavailable', 'id': '245192', 'name': 'h_3089d821ae', }]}]}
] }

input_df = pd.DataFrame(source)

My input_df is as below:

enter image description here

I'm expecting the output_df as below:

enter image description here

I could manage to get the columns volume_id volume_name volume_state name id state nodes using the below method.

input_df['data'] = input_df['data'].apply(pd.Series) 

which will result as below enter image description here

Test_df=pd.concat([json_normalize(input_df['bb3a_bmls'][key], 'volumes', ['name','id','state','nodes'], record_prefix='volume_') for key in input_df.index if isinstance(input_df['bb3a_bmls'][key],list)]).reset_index(drop=True)

Which will result for one "SERVER" - bb3a_bmls

enter image description here

Now, I don't have an idea how to get the parent columns "_id", "pi", "Datetime" back.

2

1 Answer 1

1

Idea is loop by each nested lists or by dicts and create list of dictionary for pass to DataFrame constructor:

out = []
zipped = zip(source['_id'], source['pi'], source['Datetime'], source['data'])

for a,b,c,d in zipped:
    for k1, v1 in d.items():
        for e in v1:
            #get all values of dict with exlude volumes
            di = {k2:v2 for k2, v2 in e.items() if k2 != 'volumes'}
            #for each dict in volumes add volume_ to keys
            for f in e['volumes']:
                di1 = {f'volume_{k3}':v3 for k3, v3 in f.items()}
                #create dict from previous values
                di2 = {'_id':a, 'pi':b,'Datetime':c, 'SERVER':k1}
                #add to list merged dictionaries
                out.append({**di2, ** di1, **di})

df = pd.DataFrame(out)
print (df)

            _id             pi       Datetime     SERVER volume_state  \
0  SE-DATA-BB3A  BB3A_CAP_BMLS  190725-122500  bb3a_bmls    available   
1  SE-DATA-BB3A  BB3A_CAP_BMLS  190725-122500  bb3a_bmls    available   
2  SE-DATA-BB3E  BB3E_CAP_BMLS  190725-122500  bb3b_bmls  unavailable   
3  SE-DATA-BB3E  BB3E_CAP_BMLS  190725-122500  bb3b_bmls  unavailable   
4  SE-DATA-BB3F   BB3F_CAP_PAS  190725-122500  bb3c_bmls  unavailable   
5  SE-DATA-BB3F   BB3F_CAP_PAS  190725-122500  bb3c_bmls  unavailable   

  volume_id   volume_name    name    id      state  nodes  
0    330172    q_-4144d4e  WAG 01  105F  available      3  
1    275192  p_3089d821ae  WAG 01  105F  available      3  
2    830172    w_-4144d4e  FEC 01  382E  available      4  
3    223192  g_3089d821ae  FEC 01  382E  available      4  
4    930172    e_-4144d4e  ASD 01  303F  available      6  
5    245192  h_3089d821ae  ASD 01  303F  available      6  
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks jezrael!! , By the way it looks like after processing 1st element of "volumes" , instead of processing 2nd element of "volumes", it is processing 1st element again thus generating duplicates.
@Tad - You are right, answer was a bit edited for correct output.
Excellent!!.. Thanks a lot jezrael!!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.