1

I have a df like this,

Cola    Colb
Mr      Mr..!
Mrs     Mrs.!.
Mr      Tests

I want to compare these two columns ignoring the (. and ! present in Colb) - I can generate a new column while replacing the unwanted characters. But, is there a better way to do it using pandas function ?

The expected results are true for all of the 3 rows.

This is my single line of code for a direct compare,

temp_result_df[res_col_name] = \
((temp_result_df[primaryreportreqcolname] == temp_result_df[RequiredSecondaryReport_Col_Name])\
& (temp_result_df[RequiredSecondaryReport_Col_Name]!= 'Tests'))

New to Python. So, I am exploring the different functions and methods to do a compare with some noise in the data.

3
  • 1
    What is the expected output? And compare? You mean check if they are equal? Commented Feb 14, 2019 at 17:06
  • @yatu the expected output is true for all the given values. Commented Feb 14, 2019 at 17:09
  • @Sid29 even with Mr and Tests? Commented Feb 14, 2019 at 17:10

1 Answer 1

4

IIUC,

df['res_col_name'] = df['Cola'].eq(df['Colb'].replace('\W+', '', regex = True))  | df['Colb'].eq('Tests')


    Cola    Colb    res_col_name
0   Mr      Mr..!   True
1   Mrs     Mrs.!.  True
2   Mr      Tests   True
Sign up to request clarification or add additional context in comments.

4 Comments

what does \W+ actually do here ?
\W is the regex for non-alphanumeric, so it replaces all characters
Thank you, one last question - so, if I want to replace two letters "A" and "E" by "", then I will be doing something like .replace (''A", "E", '",regex = True) - ??? how can I pass two variables here. I was putting a full stop and an exclamatory mark in question to know this more :)
No, for that do df.Colb.str.replace('A|E', '')

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.