I am trying to create a binary (yes/no) variable based on what is in a particular text string (in Python).
The data looks something like:
| Person ID | Test Result |
|---|---|
| 87 | No exercise induced ischaemia |
| 88 | Treadmill test induced increased BP |
| 89 | NORMAL test on treadmill |
and so on.
I need to pick out all the people who have "No exercise induced ischaemia". Can anybody shed some light on how to do this, given I have about 20 columns in the real data set and about 14000 rows that need to be searched.
Here's an example dataframe for convenience
d = {'ID': [87, 88, 89, 90, 91, 92], 'TestResult': ["No exercise induced ischaemia", "NORMAL test on treadmill", "No exercise induced ischaemia", "treadmill induced ischaemia", "NORMAL test on treadmill", "No exercise induced ischaemia"]}
df = pd.DataFrame(data=d)
I've tried things like
df['NegTest'] = df[df.TestResult.str.contains('No exercise induced ischaemia',case=True)]
with no luck.
Thanks for any help!
df[]on the right hand side? Just assign the result of.str.contains(). That's a bool Series.df['NegTest'] = df.TestResult.str.contains('No exercise induced ischaemia',case=True)will give you a new column of bools.