0

I know there have been many questions on this topic, but still:
My input: as dataframe

 task                                            m_label
0  S101-10061  [Cecum Landmark, ICV, Comment, Appendiceal ori...
1  S101-10069  [Rectum RF, ICV, Cecum Landmark, TI, Comment, ...
2  S101-10078  [Appendiceal orifice, ICV, Cecum Landmark, Com...
3  S101-10088  [Cecum Landmark, ICV, Comment, Appendiceal ori...
4  S101-10100  [Transverse, Appendiceal orifice, ICV, Cecum L...
5  S101-10102  [Rectum RF, ICV, Cecum Landmark, Comment, TI, ...
6  S101-10133  [Rectum RF, Transverse, ICV, Cecum Landmark, C...
7  S101YGBgZ2                                          [Comment]

I wan to split like df.m_label.str.split("",expand=True) but it return NaN Maybe problem with df? I get its from panda Series from: m_lab_task=data.groupby(['task'])['m_label'].unique(). So maybe in previously step it possible correct?

Required output:

      task       m_label1 m_label2 m_label3 m_label4 m_label5 m_label6
0   S101-10061  Cecum Landmark ICV Comment Appendiceal orifice
1   S101-10069  Rectum RF ICV Cecum Landmark TI Comment Transverse
2   S101-10078  Appendiceal orifice ICV Cecum Landmark Comment Transverse
 Rectum RF
   
4
  • 2
    Why some values are enclosed inside double quotes " and some not? Is m_lbel a string column or does it have python lists? Commented Aug 12, 2021 at 7:02
  • Sorry for confusing Commented Aug 12, 2021 at 7:11
  • Do you want the values inside single quotes '' to be in a separate column? Commented Aug 12, 2021 at 7:13
  • Yes, I wan split object, which was created by function pandas.groupby() in column m_label to separate "labels" Commented Aug 12, 2021 at 7:14

4 Answers 4

2

Use str.findall and pass the regex to capture everything enclosed by single quite '', then apply pd.Series to convert them to columns

df=df.set_index('task')['m_label'].str.findall('\'(.*?)\'').apply(pd.Series)
df.columns = [f'm_label{i+1}' for i in df]

OUTPUT:

                       m_label1             m_label2        m_label3               m_label4    m_label5    m_label6             m_label7  
task                                                                                                                                       
S101-10061       Cecum Landmark                  ICV         Comment    Appendiceal orifice         NaN         NaN                  NaN   
S101-10069            Rectum RF                  ICV  Cecum Landmark                     TI     Comment  Transverse                  NaN   
S101-10078  Appendiceal orifice                  ICV  Cecum Landmark                Comment  Transverse   Rectum RF                  NaN   
S101-10088       Cecum Landmark                  ICV         Comment    Appendiceal orifice         NaN         NaN                  NaN   
S101-10100           Transverse  Appendiceal orifice             ICV         Cecum Landmark     Comment         NaN                  NaN   
S101-10102            Rectum RF                  ICV  Cecum Landmark                Comment          TI  Transverse  Appendiceal orifice   
S101-10133            Rectum RF           Transverse             ICV         Cecum Landmark     Comment         NaN                  NaN   
S101YGBgZ2              Comment                  NaN             NaN                    NaN         NaN         NaN                  NaN   
                

If needed, you can reset the index later, and fillna('').

Sign up to request clarification or add additional context in comments.

6 Comments

I don't understand why for me its return Nan? Maybe something in preference?
Now, you have updated the question which looks like python list. Try df[m_label'].apply(type) if it's list, then you don't need to use split/findall, just use df.set_index('task').apply(pd.Series)
task S101-10061 <class 'numpy.ndarray'> Name: m_label, dtype: object
You have numpy array, just skip findall part, it should work fine i.e. df.set_index('task').apply(pd.Series), findall is for string type column only, but you have a numpy array.
Command just return me: same df within 'task' as index and 'm_label' as column in all stuff. Whats wrong? I folow step by step. Is it problem in previously steps? I use to get pandas series and convert it to df this commands: df=data.groupby(['task'])['m_label'].unique() and df=pd.DataFrame(mltask) df=mltask.reset_index()
|
1

when you convert the list into dataframe string data without separation will combine as single data to overcome this you have insert the comma before convert into dataframe like this,

import pandas as pd
data={"task":["S101-10061","S101-10069","S101-10078","S101-10088","S101-10100","S101-10102","S101-10133","S101YGBgZ2"],
     "m_label":[['Cecum Landmark','ICV' ,'Comment' ,'Appendiceal orifice'],['Rectum RF','ICV','Cecum Landmark','TI','Comment','Transverse']
               ,['Appendiceal orifice' ,'ICV' ,'Cecum Landmark', 'Comment', 'Transverse','Rectum RF'],['Cecum Landmark', 'ICV', 'Comment', 'Appendiceal orifice'],
               ['Transverse' ,'Appendiceal orifice', 'ICV', 'Cecum Landmark', 'Comment'],['Rectum RF' ,'ICV' ,'Cecum Landmark', 'Comment' ,'TI' ,'Transverse','Appendiceal orifice'],
               ['Rectum RF', 'Transverse' ,'ICV' ,'Cecum Landmark', 'Comment'],['Comment']]}
data=pd.DataFrame(data)

dataframe should like this

        task    m_label
0   S101-10061  [Cecum Landmark, ICV, Comment, Appendiceal ori...
1   S101-10069  [Rectum RF, ICV, Cecum Landmark, TI, Comment, ...
2   S101-10078  [Appendiceal orifice, ICV, Cecum Landmark, Com...
3   S101-10088  [Cecum Landmark, ICV, Comment, Appendiceal ori...
4   S101-10100  [Transverse, Appendiceal orifice, ICV, Cecum L...
5   S101-10102  [Rectum RF, ICV, Cecum Landmark, Comment, TI, ...
6   S101-10133  [Rectum RF, Transverse, ICV, Cecum Landmark, C...
7   S101YGBgZ2  [Comment]

output code

import numpy as np
data=pd.concat([data["task"],data["m_label"].apply(lambda x:pd.Series(x).add_prefix("m_label"))],axis=1).replace(np.nan," ")

task    m_label0    m_label1    m_label2    m_label3    m_label4    m_label5    m_label6
0   S101-10061  Cecum Landmark  ICV Comment Appendiceal orifice         
1   S101-10069  Rectum RF   ICV Cecum Landmark  TI  Comment Transverse  
2   S101-10078  Appendiceal orifice ICV Cecum Landmark  Comment Transverse  Rectum RF   
3   S101-10088  Cecum Landmark  ICV Comment Appendiceal orifice         
4   S101-10100  Transverse  Appendiceal orifice ICV Cecum Landmark  Comment     
5   S101-10102  Rectum RF   ICV Cecum Landmark  Comment TI  Transverse  Appendiceal orifice
6   S101-10133  Rectum RF   Transverse  ICV Cecum Landmark  Comment     
7   S101YGBgZ2  Comment     

Comments

0

Just to add something to ThePyGuy's answer,if you want to rename columns "on the fly", you can use add_prefix().

df.set_index('task')['m_label'].str.findall('\'(.*?)\'').apply(pd.Series).add_prefix('m_label')

output:

Out[27]: 
                  m_label0 m_label1  ... m_label4    m_label5
task                                 ...                     
S101-10061  Cecum Landmark      ICV  ...      NaN         NaN
S101-10069       Rectum RF      ICV  ...  Comment  Transverse

Comments

0

From Mr.Brarath Narayan code, I hope you can short the code as follows without use of numpy

df = pd.concat([df['task'],df['m_label'].apply(pd.Series).add_prefix("m_label").fillna("")], axis = 1)

Output

task    m_label0    m_label1    m_label2    m_label3    m_label4    m_label5    m_label6
0   S101-10061  Cecum Landmark  ICV Comment Appendiceal orifice         
1   S101-10069  Rectum RF   ICV Cecum Landmark  TI  Comment Transverse  
2   S101-10078  Appendiceal orifice ICV Cecum Landmark  Comment Transverse  Rectum RF   
3   S101-10088  Cecum Landmark  ICV Comment Appendiceal orifice         
4   S101-10100  Transverse  Appendiceal orifice ICV Cecum Landmark  Comment     
5   S101-10102  Rectum RF   ICV Cecum Landmark  Comment TI  Transverse  Appendiceal orifice
6   S101-10133  Rectum RF   Transverse  ICV Cecum Landmark  Comment     
7   S101YGBgZ2  Comment                     

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.