0

I have a pandas dataframe like as shown below

import pandas as pd
import numpy as np
df=pd.DataFrame({'Adm DateTime':['02/25/2012','03/05/1996','11/12/2010','31/05/2012','21/07/2019','31/10/2020'],
                 's_id':[1,2,3,4,5,6],
                'test_string_1':['test','Thalaivar','Superstar','God','Favorite','Rajinikanth'],
                'test_string_2':['Rajinikanth','God of Cinema','Favorite','Superstar','Rahman','ARR']})
df['Adm DateTime'] = pd.to_datetime(df['Adm DateTime'])

I would like to check whether a substring is present in any of the columns (test_string_1 and test_string_2)

Though I am able to do for one column like as shown below

df['op_flag'] = np.where(df['test_string_1'].str.contains('Rajini|God|Thalaivar',case=False),1, 0)

Can you help me with how can we do this across both the columns?

Should I repeat the above code with a different column name?

Is there any way to provide the column names that I would like to check for in the code?

0

1 Answer 1

2

You can do this with a lambda function

In [40]: df[['test_string_1', 'test_string_2']].apply(lambda x: x.str.contains('Rajini|God|Thalaivar',case=False)).any(axis=1).astype(int)
Out[40]:
0    1
1    1
2    0
3    1
4    0
5    1
dtype: int64
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.