3

How can I split a column in pandas DataFrame by variable names in a column? I have a DataFrame below:

    ID  FEATURE PARAM   VALUE
0   A101    U1  ITEM1   10
1   A101    U1  ITEM2   11
2   A101    U2  ITEM1   12
3   A101    U2  ITEM2   13
4   A102    U1  ITEM1   14
5   A102    U1  ITEM2   15
6   A102    U2  ITEM1   16
7   A102    U2  ITEM2   17

I want to split it as below.

    ID  FEATURE ITEM1   ITEM2
0   A101    U1  10  11
1   A101    U2  12  13
2   A102    U1  14  15
3   A102    U2  16  17

I tried to use one of the responses and it works great but partially.

Select_Data.groupby('PARAM')['VALUE'].apply(list).apply(pd.Series).T

PARAM   ITEM1   ITEM2
0   10  11
1   12  13
2   14  15
3   16  17

But I lost my ID & FEATURE columns and I want to keep them in the table. I will greatly appreciate any suggestions.

2
  • Is my answer is what you are looking for ? Commented Aug 7, 2017 at 9:37
  • Hi Bharath, yes your answer worked for me. Thank you very much. I really appreciate it. Commented Aug 8, 2017 at 1:31

3 Answers 3

1

You can also use pivot_table with index ID,FEATURE and then reset index i.e

ndf =  pd.pivot_table(df,columns='PARAM', values='VALUE',index=['ID','FEATURE']).reset_index()

Incase you want to aggregate duplicate values then you can use mean value

ndf =  pd.pivot_table(df,columns='PARAM', values='VALUE',index=['ID','FEATURE'],aggfunc='mean').reset_index()

Output:

PARAM    ID FEATURE  ITEM1  ITEM2
0      A101      U1     10     11
1      A101      U2     12     13
2      A102      U1     14     15
3      A102      U2     16     17
In [528]:

Sign up to request clarification or add additional context in comments.

Comments

1

Using groupby you can

In [566]: df.groupby('c1')['c2'].apply(list).apply(pd.Series).T
Out[566]:
c1  A  B  C
0   1  2  3
1   4  5  6

2 Comments

Awesome and simple
Awesome. Thank you so much. I really appreciate it.
1

You can use set_index and unstack:

df = df.set_index(['ID','FEATURE','PARAM'])['VALUE']
       .unstack()
       .reset_index()
       .rename_axis(None, axis=1)
print (df)
     ID FEATURE  ITEM1  ITEM2
0  A101      U1     10     11
1  A101      U2     12     13
2  A102      U1     14     15
3  A102      U2     16     17

but if get:

ValueError: Index contains duplicate entries, cannot reshape

then use Bharath shetty's solution or groupby and aggregate mean, because duplicates in triples ID,FEATURE,PARAM:

print (df)
     ID FEATURE  PARAM  VALUE
0  A101      U2  ITEM1     50<-same A101,U2,ITEM1
1  A101      U1  ITEM2     11
2  A101      U2  ITEM1     12<-same A101,U2,ITEM1
3  A101      U2  ITEM2     13
4  A102      U1  ITEM1     14
5  A102      U1  ITEM2     15
6  A102      U2  ITEM1     16
7  A102      U2  ITEM2     17


df = df.groupby(['ID','FEATURE','PARAM'])['VALUE'].mean()
       .unstack().reset_index().rename_axis(None, axis=1)
print (df)
     ID FEATURE  ITEM1  ITEM2
0  A101      U1    NaN   11.0
1  A101      U2   31.0   13.0<-(50+12)/2=31
2  A102      U1   14.0   15.0
3  A102      U2   16.0   17.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.