1

I have a data frame whose header looks like following, 

df.head()
Out[660]:
Samples variable    value   Type
0   PE01I   267N12.3_Beta   0.066517    Beta
1   PE01R   R267N12.3_Beta  0.061617    Beta
2   PE02I   267N12.3_Beta   0.071013    Beta
3   PE02R   267N12.3_Beta   0.056623    Beta
4   PE03I   267N12.3_Beta   0.071633    Beta
5   PE01I   267N12.3_FPKM   0.000000    FPKM
6   PE01R   267N12.3_FPKM   0.003430    FPKM
7   PE02I   267N12.3_FPKM   0.272144    FPKM
8   PE02R   267N12.3_FPKM   0.005753    FPKM
9   PE03I   267N12.3_FPKM   0.078708    FPKM

And I wanted to add new columns with header name as Beta and FPKM by using from column "Type"based on their corresponding values from column "value". So far I tried this via following one-liner,

df['Beta'] = df['Type'].map(lambda x: df.value if x == "Beta" else "FPKM")

and it give sme following output,

Samples variable    value   Type                      Beta
0   PE01I   267N12.3_Beta   0.066517    Beta        0 0.066517 1 0.061617 2 0.07...
1   PE01R   267N12.3_Beta   0.061617    Beta    0 0.066517 1 0.061617 2 0.07...
2   PE02I   267N12.3_Beta   0.071013    Beta    0 0.066517 1 0.061617 2 0.07...
3   PE02R   267N12.3_Beta   0.056623    Beta    0 0.066517 1 0.061617 2 0.07...
4   PE03I   267N12.3_Beta   0.071633    Beta    0 0.066517 1 0.061617 2 0.07...

The column Beta has three values and all column is repeating. What I am aiming is to have a data frame which looks like,

Samples variable    Beta    FPKM
PE01I   267N12.3_Beta   0.066517    0
PE01R   267N12.3_Beta   0.061617    0.00343
PE02I   267N12.3_Beta   0.071013    0.272144
PE02R   267N12.3_Beta   0.056623    0.005753
PE03I   267N12.3_Beta   0.071633    0.078708

Any help would be really great.. Thank you

1
  • use the where function Commented Aug 9, 2016 at 14:02

2 Answers 2

1

I think you need unstack:

df1 = df.set_index(['Samples','Type']).unstack()
print (df1)
               variable                    value          
Type               Beta           FPKM      Beta      FPKM
Samples                                                   
PE01I     267N12.3_Beta  267N12.3_FPKM  0.066517  0.000000
PE01R    R267N12.3_Beta  267N12.3_FPKM  0.061617  0.003430
PE02I     267N12.3_Beta  267N12.3_FPKM  0.071013  0.272144
PE02R     267N12.3_Beta  267N12.3_FPKM  0.056623  0.005753
PE03I     267N12.3_Beta  267N12.3_FPKM  0.071633  0.078708

#remove Multiindex in columns
df1.columns = ['_'.join(col) for col in df1.columns]
df1.reset_index(inplace=True)
print (df1)
  Samples   variable_Beta  variable_FPKM  value_Beta  value_FPKM
0   PE01I   267N12.3_Beta  267N12.3_FPKM    0.066517    0.000000
1   PE01R  R267N12.3_Beta  267N12.3_FPKM    0.061617    0.003430
2   PE02I   267N12.3_Beta  267N12.3_FPKM    0.071013    0.272144
3   PE02R   267N12.3_Beta  267N12.3_FPKM    0.056623    0.005753
4   PE03I   267N12.3_Beta  267N12.3_FPKM    0.071633    0.078708

#if need remove column
print (df1.drop('variable_FPKM', axis=1))
  Samples   variable_Beta  value_Beta  value_FPKM
0   PE01I   267N12.3_Beta    0.066517    0.000000
1   PE01R  R267N12.3_Beta    0.061617    0.003430
2   PE02I   267N12.3_Beta    0.071013    0.272144
3   PE02R   267N12.3_Beta    0.056623    0.005753
4   PE03I   267N12.3_Beta    0.071633    0.078708

EDIT by comment:

If get error:

ValueError: Index contains duplicate entries, cannot reshape

it means you have duplicates values in index and aggragating is necessary.

You need pivot_table and if aggfunc is np.sum or np.mean (working with numeric), string columns are omited and function ''.join works only with string values and numeric are omited.

Call function twice with different aggfunc and then use concat:

import pandas as pd

df = pd.DataFrame({'Type': {0: 'Beta', 1: 'Beta', 2: 'Beta', 3: 'Beta', 4: 'Beta', 5: 'FPKM', 6: 'FPKM', 7: 'FPKM', 8: 'FPKM', 9: 'FPKM'}, 'value': {0: 0.066516999999999993, 1: 0.061616999999999998, 2: 0.071012999999999993, 3: 0.056623, 4: 0.071633000000000002, 5: 0.0, 6: 0.0034299999999999999, 7: 0.272144, 8: 0.0057530000000000003, 9: 0.078708}, 'variable': {0: '267N12.3_Beta', 1: 'R267N12.3_Beta', 2: '267N12.3_Beta', 3: '267N12.3_Beta', 4: '267N12.3_Beta', 5: '267N12.3_FPKM', 6: '267N12.3_FPKM', 7: '267N12.3_FPKM', 8: '267N12.3_FPKM', 9: '267N12.3_FPKM'}, 'Samples': {0: 'PE01I', 1: 'PE01I', 2: 'PE02I', 3: 'PE02R', 4: 'PE03I', 5: 'PE01I', 6: 'PE01R', 7: 'PE02I', 8: 'PE02R', 9: 'PE03I'}})

#changed value in second row in column Samples
print (df)
  Samples  Type     value        variable
0   PE01I  Beta  0.066517   267N12.3_Beta
1   PE01I  Beta  0.061617  R267N12.3_Beta
2   PE02I  Beta  0.071013   267N12.3_Beta
3   PE02R  Beta  0.056623   267N12.3_Beta
4   PE03I  Beta  0.071633   267N12.3_Beta
5   PE01I  FPKM  0.000000   267N12.3_FPKM
6   PE01R  FPKM  0.003430   267N12.3_FPKM
7   PE02I  FPKM  0.272144   267N12.3_FPKM
8   PE02R  FPKM  0.005753   267N12.3_FPKM
9   PE03I  FPKM  0.078708   267N12.3_FPKM
df1 = df.pivot_table(index='Samples', columns=['Type'], aggfunc=','.join)
print (df1)
                             variable               
Type                             Beta           FPKM
Samples                                             
PE01I    267N12.3_Beta,R267N12.3_Beta  267N12.3_FPKM
PE01R                            None  267N12.3_FPKM
PE02I                   267N12.3_Beta  267N12.3_FPKM
PE02R                   267N12.3_Beta  267N12.3_FPKM
PE03I                   267N12.3_Beta  267N12.3_FPKM

df2 = df.pivot_table(index='Samples', columns=['Type'], aggfunc=np.mean)
print (df2)
            value          
Type         Beta      FPKM
Samples                    
PE01I    0.064067  0.000000
PE01R         NaN  0.003430
PE02I    0.071013  0.272144
PE02R    0.056623  0.005753
PE03I    0.071633  0.078708

df3 = pd.concat([df1, df2], axis=1)
df3.columns = ['_'.join(col) for col in df3.columns]
df3.reset_index(inplace=True)
print (df3)
  Samples                 variable_Beta  variable_FPKM  value_Beta  value_FPKM
0   PE01I  267N12.3_Beta,R267N12.3_Beta  267N12.3_FPKM    0.064067    0.000000
1   PE01R                          None  267N12.3_FPKM         NaN    0.003430
2   PE02I                 267N12.3_Beta  267N12.3_FPKM    0.071013    0.272144
3   PE02R                 267N12.3_Beta  267N12.3_FPKM    0.056623    0.005753
4   PE03I                 267N12.3_Beta  267N12.3_FPKM    0.071633    0.078708
Sign up to request clarification or add additional context in comments.

2 Comments

The column sample has duplicate values and its throwing error as, ValueError: Index contains duplicate entries, cannot reshape
I add solution for problem with duplicates in index, see edit.
1

You could use merge after separating them into 2 dataframes based on their Type column.

In [14]: df_1 = df.loc[(df['Type'] == "Beta"), ['Samples', 'variable', 'value']]

In [15]: df_2 = df.loc[(df['Type'] == "FPKM"), ['Samples', 'value']]

In [16]: df_1['Beta'] = df_1['value']

In [17]: df_2['FPKM'] = df_2['value']

In [18]: df_1[['Samples', 'variable', 'Beta']].merge(df_2[['Samples', 'FPKM']], on="Samples")
Out[18]: 
  Samples        variable      Beta      FPKM
0   PE01I   267N12.3_Beta  0.066517  0.000000
1   PE01R  R267N12.3_Beta  0.061617  0.003430
2   PE02I   267N12.3_Beta  0.071013  0.272144
3   PE02R   267N12.3_Beta  0.056623  0.005753
4   PE03I   267N12.3_Beta  0.071633  0.078708

1 Comment

Thank you actually this df I am showing is from a bigger dataframe , and I melt it to that structure ..

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.