Create new column If Else based on multiple column conditions

Question

I've tried to look into similar questions but, as far I as searched, I couldn't find anything that could help.

I have a daily report that I extract from a data base but one info in there is exactly what need to be delivered. Here's an example on what I extract:

col1           col2
wrongstring    correct
correctstring  correct
correctstring  correct
NaN            correct
NaN            NaN

The info in col2 is already corrected using a dict and replace, and the NaN is missing value from data base and it I need to replace it with the correct string for missing values. Today it is done in Excel with a vlookup and if and I want to implement it inside the script so we could gain some time.

What I want to do is:

If df['col1'] = wrongstring then new column would use df['col2'] value.

If df['col1'] is NaN then new column use df['col2'] value.

If both columns are NaN then the new column should use newstring.

Else keep df['col1'] value.

So far I've come up with this code that brings an error( I understand it's from the .isnull() part, however I couldn't find a way to fix it):

df['newcolumn'] = [x in df['col2'] if x=='wrongstring' else ('newstring' if ((df['col1'].isnull()) and (df['col2'].isnull())) else x in df['col1']) 
                           for x in df['col1']]

If anyone could help me out with this, maybe the approach I used is not the correct one or i'm missing something. The results should look like this:

col1           col2     newcolumn
wrongstring    correct  correct
correctstring  correct  correctstring  
correctstring  correct  correctstring  
NaN            correct  correct
NaN            NaN      newstring

Thanks for any help. Cheers.

Erfan · Accepted Answer · 2020-06-08 00:22:18Z

2

Method 1: `np.select`

For multi conditionial column, we can use np.select:

m1 = df['col1'].eq('wrongstring')
m2 = df['col1'].eq('correctstring')
m3 = df['col1'].isna() & df['col2'].notna()

df['newcolumn'] = np.select([m1,m2,m3], 
                             [df['col2'], df['col1'], df['col2']], 
                             default='newstring')

            col1     col2     newcolumns
0    wrongstring  correct        correct
1  correctstring  correct  correctstring
2  correctstring  correct  correctstring
3            NaN  correct        correct
4            NaN      NaN      newstring

Method 2: `Series.mask` & `Series.fillna`:

df['newcolumn'] = df['col1'].mask(
    df['col1'].eq('wrongstring')
).fillna(df['col2']).fillna('newstring')

            col1     col2      newcolumn
0    wrongstring  correct        correct
1  correctstring  correct  correctstring
2  correctstring  correct  correctstring
3            NaN  correct        correct
4            NaN      NaN      newstring

edited Jun 8, 2020 at 0:22

answered Jun 8, 2020 at 0:15

Erfan

43.3k10 gold badges75 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Gustavo Rottgering Over a year ago

Thanks @Erfan. I think I tend to overthink a solution and never get the result.

Gustavo Rottgering Over a year ago

testing in the previous file it worked just fine. However adding it to the script, somehow I got an error: Method 1 - m3 and default does nothing, cells in the new column are kept blank. Method 2 - it proper works, however after a few thousand lines it simple start to skip rows and do not fill all. I've tried to solve it but keep getting the erros. Maybe you can provide some hints in what may be causing it. thanks @Erfan

BENY · Accepted Answer · 2020-06-08 00:21:23Z

2

We can do condition replace

df['newcolumns']=df.col1.replace({'wrongstring':np.nan}).fillna(df.col2).fillna('newstring')

df
            col1     col2     newcolumns
0    wrongstring  correct        correct
1  correctstring  correct  correctstring
2  correctstring  correct  correctstring
3            NaN  correct        correct
4            NaN      NaN      newstring

edited Jun 8, 2020 at 0:21

answered Jun 8, 2020 at 0:08

BENY

324k22 gold badges176 silver badges250 bronze badges

2 Comments

Gustavo Rottgering Over a year ago

Interesting @YOBEN_S. I didn't know that I could use fillna() twice like that. Thanks for sharing.

Gustavo Rottgering Over a year ago

I've tried to implement both yours and @Erfan into the script that get the data and it only worked partially. If the script saves the Excel and then I read it again and apply the solution it works correctly. However, using it inside the script, some rows are not 'masked' or replaced. The script only makes a query into the database and creates a DF. Then I apply this answer then save it to Excel, but some rows aren't affected. Do you may have an idea on why? Thanks

Collectives™ on Stack Overflow

Create new column If Else based on multiple column conditions

2 Answers 2

Method 1: `np.select`

Method 2: `Series.mask` & `Series.fillna`:

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Method 1: np.select

Method 2: Series.mask & Series.fillna:

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related

Method 1: `np.select`

Method 2: `Series.mask` & `Series.fillna`: