0

I have a Dataframe in the below format:

id, data
101, [{"tree":[
               {"Group":"1001","sub-group":3,"Child":"100267","Child_1":"8 cm"},
               {"Group":"1002","sub-group":1,"Child":"102280","Child_1":"4 cm"},
               {"Group":"1003","sub-group":0,"Child":"102579","Child_1":"0.1 cm"}]}]
102, [{"tree":[
               {"Group":"2001","sub-group":3,"Child":"200267","Child_1":"6 cm"},
               {"Group":"2002","sub-group":1,"Child":"202280","Child_1":"4 cm"}]}]
103,  

I am trying to have data from this one column split into multiple columns

Expected output:

id, Group, sub-group, Child, Child_1, Group, sub-group, Child, Child_1, Group, sub-group, Child, Child_1
101, 1001, 3, 100267, 8 cm, 1002, 1, 102280, 4 cm, 1003, 0, 102579, 0.1 cm
102, 2001, 3, 200267, 6 cm, 2002, 1, 2022280, 4 cm
103

Output of df.loc[:15, ['id','data']].to_dict()

{'id': {1: '101',
        4: '102',
        11: '103',
        15: '104',
        16: '105'},
        'data': {1: '[{"tree":[{"Group":"","sub-group":"3","Child":"100267","Child_1":"8 cm"}]}]',
        4: '[{"tree":[{"sub-group":"0.01","Child_1":"4 cm"}]}]',
        11: '[{"tree":[{"sub-group":null,"Child_1":null}]}]',
        15: '[{"tree":[{"Group":"1003","sub-group":15,"Child":"child_","Child_1":"41 cm"}]}]',
        16: '[{"tree":[{"sub-group":"0.00","Child_1":"0"}]}]'}}

1 Answer 1

2

you can use explode on the column data, create a dataframe from it, add a cumcount column, then some shape change with set_index, stack, unstack and drop to fit your expected output, join back to the column id

s = df['data'].dropna().str['tree'].explode()
df_f = df[['id']].join(pd.DataFrame(s.tolist(), s.index)\
                         .assign(cc=lambda x: x.groupby(level=0).cumcount()+1)\
                         .set_index('cc', append=True)\
                         .stack()\
                         .unstack(level=[-2,-1])\
                         .droplevel(0, axis=1), 
                       how='left')
print (df_f)
    id Group sub-group   Child Child_1 Group sub-group   Child Child_1 Group  \
0  101  1001         3  100267    8 cm  1002         1  102280    4 cm  1003   
1  102  2001         3  200267    6 cm  2002         1  202280    4 cm   NaN   
2  103   NaN       NaN     NaN     NaN   NaN       NaN     NaN     NaN   NaN   

  sub-group   Child Child_1  
0         0  102579  0.1 cm  
1       NaN     NaN     NaN  
2       NaN     NaN     NaN  

Note: while it does fit your expected output, having several times the same column name is not really a good practice. I would rather remove the method drop and flatten the multiindex column.

Edit: After some comments, I guess one way to actually go through the whole column with some weird format:

import ast
def f(x):
    try: 
        return ast.literal_eval(x.replace('null', "'nan'"))[0]['tree'] 
    except:
        return [{}]
# then create s with 
s = df['data'].apply(f).explode()
# then create df_f like above
Sign up to request clarification or add additional context in comments.

9 Comments

nice one, I couldn't get the indices to align in mine
@Ben.T thanks for the reply. I however just get the id column returned back when I try the above code.. using pandas version 1.0.1
@Ben.T type(df['data'].iloc[0]) returns str
@Ben.T, after importing ast module, it gave me this message ValueError: malformed node or string: <_ast.Name object at 0x12208b8d0>. Have edited my post with the output of df.loc[:3, ['id','data']].to_dict()
@KevinNash see my edit to create s, without all the data and seeing all the exception possible in the format, it is the only thing I can think of.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.