I have a pandas dataframe as below:
import pandas as pd
import numpy as np
df = pd.DataFrame({'ORDER':["A", "A", "A", "A", "B","B"], 'A':[80, 23, np.nan, 60, 1,22], 'B': [80, 55, 5, 76, 67,np.nan]})
df
ORDER A B
0 A 80.0 80.0
1 A 23.0 55.0
2 A NaN 5.0
3 A 60.0 76.0
4 B 1.0 67.0
5 B 22.0 NaN
I want to create a column "new" as below: If ORDER == 'A', then new=df['A'] If ORDER == 'B', then new=df['B']
This can be achieved using the below code:
df['new'] = np.where(df['ORDER'] == 'A', df['A'], np.nan)
df['new'] = np.where(df['ORDER'] == 'B', df['B'], df['new'])
The tweak here is if ORDER doesnot have the value "B", Then B will not be present in the dataframe.So the dataframe might look like below. And if we use the above code o this dataframe, it will give an error because column "B" is missing from this dataframe.
ORDER A
0 A 80.0
1 A 23.0
2 A NaN
3 A 60.0
4 A 1.0
5 A 22.0