Splitting column in Python Pandas dataframe

Question

How can I split a column in pandas DataFrame by variable names in a column? I have a DataFrame below:

    ID  FEATURE PARAM   VALUE
0   A101    U1  ITEM1   10
1   A101    U1  ITEM2   11
2   A101    U2  ITEM1   12
3   A101    U2  ITEM2   13
4   A102    U1  ITEM1   14
5   A102    U1  ITEM2   15
6   A102    U2  ITEM1   16
7   A102    U2  ITEM2   17

I want to split it as below.

    ID  FEATURE ITEM1   ITEM2
0   A101    U1  10  11
1   A101    U2  12  13
2   A102    U1  14  15
3   A102    U2  16  17

I tried to use one of the responses and it works great but partially.

Select_Data.groupby('PARAM')['VALUE'].apply(list).apply(pd.Series).T

PARAM   ITEM1   ITEM2
0   10  11
1   12  13
2   14  15
3   16  17

But I lost my ID & FEATURE columns and I want to keep them in the table. I will greatly appreciate any suggestions.

Hi Bharath, yes your answer worked for me. Thank you very much. I really appreciate it. — rverma
– rverma, Commented Aug 8, 2017 at 1:31

Bharath M Shetty · Accepted Answer · 2017-08-07 10:25:52Z

1

You can also use pivot_table with index ID,FEATURE and then reset index i.e

ndf =  pd.pivot_table(df,columns='PARAM', values='VALUE',index=['ID','FEATURE']).reset_index()

Incase you want to aggregate duplicate values then you can use mean value

ndf =  pd.pivot_table(df,columns='PARAM', values='VALUE',index=['ID','FEATURE'],aggfunc='mean').reset_index()

Output:

PARAM    ID FEATURE  ITEM1  ITEM2
0      A101      U1     10     11
1      A101      U2     12     13
2      A102      U1     14     15
3      A102      U2     16     17
In [528]:

edited Aug 7, 2017 at 10:25

answered Aug 6, 2017 at 8:53

Bharath M Shetty

30.6k6 gold badges65 silver badges111 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Zero · Accepted Answer · 2017-08-06 07:38:42Z

1

Using groupby you can

In [566]: df.groupby('c1')['c2'].apply(list).apply(pd.Series).T
Out[566]:
c1  A  B  C
0   1  2  3
1   4  5  6

answered Aug 6, 2017 at 7:38

Zero

77.4k22 gold badges153 silver badges153 bronze badges

2 Comments

Bharath M Shetty Over a year ago

Awesome and simple

rverma Over a year ago

Awesome. Thank you so much. I really appreciate it.

jezrael · Accepted Answer · 2017-08-07 10:25:55Z

You can use set_index and unstack:

df = df.set_index(['ID','FEATURE','PARAM'])['VALUE']
       .unstack()
       .reset_index()
       .rename_axis(None, axis=1)
print (df)
     ID FEATURE  ITEM1  ITEM2
0  A101      U1     10     11
1  A101      U2     12     13
2  A102      U1     14     15
3  A102      U2     16     17

but if get:

ValueError: Index contains duplicate entries, cannot reshape

then use Bharath shetty's solution or groupby and aggregate mean, because duplicates in triples ID,FEATURE,PARAM:

print (df)
     ID FEATURE  PARAM  VALUE
0  A101      U2  ITEM1     50<-same A101,U2,ITEM1
1  A101      U1  ITEM2     11
2  A101      U2  ITEM1     12<-same A101,U2,ITEM1
3  A101      U2  ITEM2     13
4  A102      U1  ITEM1     14
5  A102      U1  ITEM2     15
6  A102      U2  ITEM1     16
7  A102      U2  ITEM2     17


df = df.groupby(['ID','FEATURE','PARAM'])['VALUE'].mean()
       .unstack().reset_index().rename_axis(None, axis=1)
print (df)
     ID FEATURE  ITEM1  ITEM2
0  A101      U1    NaN   11.0
1  A101      U2   31.0   13.0<-(50+12)/2=31
2  A102      U1   14.0   15.0
3  A102      U2   16.0   17.0

Collectives™ on Stack Overflow

Splitting column in Python Pandas dataframe

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related