I've tried to look into similar questions but, as far I as searched, I couldn't find anything that could help.
I have a daily report that I extract from a data base but one info in there is exactly what need to be delivered. Here's an example on what I extract:
col1 col2
wrongstring correct
correctstring correct
correctstring correct
NaN correct
NaN NaN
The info in col2 is already corrected using a dict and replace, and the NaN is missing value from data base and it I need to replace it with the correct string for missing values. Today it is done in Excel with a vlookup and if and I want to implement it inside the script so we could gain some time.
What I want to do is:
If df['col1'] = wrongstring then new column would use df['col2'] value.
If df['col1'] is NaN then new column use df['col2'] value.
If both columns are NaN then the new column should use newstring.
Else keep df['col1'] value.
So far I've come up with this code that brings an error( I understand it's from the .isnull() part, however I couldn't find a way to fix it):
df['newcolumn'] = [x in df['col2'] if x=='wrongstring' else ('newstring' if ((df['col1'].isnull()) and (df['col2'].isnull())) else x in df['col1'])
for x in df['col1']]
If anyone could help me out with this, maybe the approach I used is not the correct one or i'm missing something. The results should look like this:
col1 col2 newcolumn
wrongstring correct correct
correctstring correct correctstring
correctstring correct correctstring
NaN correct correct
NaN NaN newstring
Thanks for any help. Cheers.