1

I've tried to look into similar questions but, as far I as searched, I couldn't find anything that could help.

I have a daily report that I extract from a data base but one info in there is exactly what need to be delivered. Here's an example on what I extract:

col1           col2
wrongstring    correct
correctstring  correct
correctstring  correct
NaN            correct
NaN            NaN

The info in col2 is already corrected using a dict and replace, and the NaN is missing value from data base and it I need to replace it with the correct string for missing values. Today it is done in Excel with a vlookup and if and I want to implement it inside the script so we could gain some time.

What I want to do is:

If df['col1'] = wrongstring then new column would use df['col2'] value.

If df['col1'] is NaN then new column use df['col2'] value.

If both columns are NaN then the new column should use newstring.

Else keep df['col1'] value.

So far I've come up with this code that brings an error( I understand it's from the .isnull() part, however I couldn't find a way to fix it):

df['newcolumn'] = [x in df['col2'] if x=='wrongstring' else ('newstring' if ((df['col1'].isnull()) and (df['col2'].isnull())) else x in df['col1']) 
                           for x in df['col1']] 

If anyone could help me out with this, maybe the approach I used is not the correct one or i'm missing something. The results should look like this:

col1           col2     newcolumn
wrongstring    correct  correct
correctstring  correct  correctstring  
correctstring  correct  correctstring  
NaN            correct  correct
NaN            NaN      newstring

Thanks for any help. Cheers.

2 Answers 2

2

Method 1: np.select

For multi conditionial column, we can use np.select:

m1 = df['col1'].eq('wrongstring')
m2 = df['col1'].eq('correctstring')
m3 = df['col1'].isna() & df['col2'].notna()

df['newcolumn'] = np.select([m1,m2,m3], 
                             [df['col2'], df['col1'], df['col2']], 
                             default='newstring')

            col1     col2     newcolumns
0    wrongstring  correct        correct
1  correctstring  correct  correctstring
2  correctstring  correct  correctstring
3            NaN  correct        correct
4            NaN      NaN      newstring

Method 2: Series.mask & Series.fillna:

df['newcolumn'] = df['col1'].mask(
    df['col1'].eq('wrongstring')
).fillna(df['col2']).fillna('newstring')

            col1     col2      newcolumn
0    wrongstring  correct        correct
1  correctstring  correct  correctstring
2  correctstring  correct  correctstring
3            NaN  correct        correct
4            NaN      NaN      newstring
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks @Erfan. I think I tend to overthink a solution and never get the result.
testing in the previous file it worked just fine. However adding it to the script, somehow I got an error: Method 1 - m3 and default does nothing, cells in the new column are kept blank. Method 2 - it proper works, however after a few thousand lines it simple start to skip rows and do not fill all. I've tried to solve it but keep getting the erros. Maybe you can provide some hints in what may be causing it. thanks @Erfan
2

We can do condition replace

df['newcolumns']=df.col1.replace({'wrongstring':np.nan}).fillna(df.col2).fillna('newstring')

df
            col1     col2     newcolumns
0    wrongstring  correct        correct
1  correctstring  correct  correctstring
2  correctstring  correct  correctstring
3            NaN  correct        correct
4            NaN      NaN      newstring

2 Comments

Interesting @YOBEN_S. I didn't know that I could use fillna() twice like that. Thanks for sharing.
I've tried to implement both yours and @Erfan into the script that get the data and it only worked partially. If the script saves the Excel and then I read it again and apply the solution it works correctly. However, using it inside the script, some rows are not 'masked' or replaced. The script only makes a query into the database and creates a DF. Then I apply this answer then save it to Excel, but some rows aren't affected. Do you may have an idea on why? Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.