0

Good afternoon, i am trying to split text in a column to a specfic format here is my table below

UserId  Application
1       Grey Blue::Black Orange;White:Green
2       Yellow Purple::Orange Grey;Blue Pink::Red

I would like it to read the following:

UserId     Application          Role
    1       Grey Blue           Black Orange
    1       White               Green
    2       Yellow Purple       Orange Grey 
    2       Blue Pink           Red

So far my code is

def unnesting(df, explode):
idx=df.index.repeat(df[explode[0]].str.len())
df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)
df1.index=idx
return df1.join(df.drop(explode,1),how='left')

df['Application']=df.Roles.str.split(';|::|:').map(lambda x : x[0::2])

unnesting(df.drop('Roles',1),['Application'])

The following output code reads

UserId     Application          
        1       Grey Blue           
        1       White               
        2       Yellow Purple        
        2       Blue Pink          

i do not know how to add the second column (role) in the code for the second split after ::

1 Answer 1

1

Given this dataframe:

   UserId                                Application
0       1       Grey Blue::Black Orange;White::Green
1       2  Yellow Purple::Orange Grey;Blue Pink::Red

you could at least achieve the last two columns directly via

df.Application.str.split(';', expand=True).stack().str.split('::', expand=True).reset_index().drop(columns=['level_0', 'level_1'])

which results in

               0             1
0      Grey Blue  Black Orange
1          White         Green
2  Yellow Purple   Orange Grey
3      Blue Pink           Red

However, defining UserId as index before would also provide the proper UserId column:

result = df.set_index('UserId').Application.str.split(';', expand=True).stack().str.split('::', expand=True).reset_index().drop(columns=['level_1'])
result.columns = ['UserId', 'Application', 'Role']

   UserId    Application          Role
0       1      Grey Blue  Black Orange
1       1          White         Green
2       2  Yellow Purple   Orange Grey
3       2      Blue Pink           Red
Sign up to request clarification or add additional context in comments.

2 Comments

thank you for responding! i dont know what but the set index of user idk is not owrking, any thoughts? the error is a keyerror: 'UserId' @spghttcd
ok, that means your dataframe has an additional index and UserIdis a normal column - I thought UserId was your index. I'll edit...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.