0

I am trying to create a binary (yes/no) variable based on what is in a particular text string (in Python).

The data looks something like:

Person ID Test Result
87 No exercise induced ischaemia
88 Treadmill test induced increased BP
89 NORMAL test on treadmill

and so on.

I need to pick out all the people who have "No exercise induced ischaemia". Can anybody shed some light on how to do this, given I have about 20 columns in the real data set and about 14000 rows that need to be searched.

Here's an example dataframe for convenience

d = {'ID': [87, 88, 89, 90, 91, 92], 'TestResult': ["No exercise induced ischaemia", "NORMAL test on treadmill",  "No exercise induced ischaemia", "treadmill induced ischaemia", "NORMAL test on treadmill", "No exercise induced ischaemia"]}
df = pd.DataFrame(data=d)

I've tried things like

df['NegTest'] = df[df.TestResult.str.contains('No exercise induced ischaemia',case=True)]

with no luck.

Thanks for any help!

2
  • Do you just need to drop the outer df[] on the right hand side? Just assign the result of .str.contains(). That's a bool Series. Commented Feb 2, 2022 at 0:31
  • 1
    Just df['NegTest'] = df.TestResult.str.contains('No exercise induced ischaemia',case=True) will give you a new column of bools. Commented Feb 2, 2022 at 0:32

1 Answer 1

1

You're very close. Just use np.where to actually generate the yes/no:

df['NegTest'] = np.where(df.TestResult.str.contains('No exercise induced ischaemia', case=True), 'yes', 'no')

Output:

>>> df
   ID                     TestResult NegTest
0  87  No exercise induced ischaemia     yes
1  88       NORMAL test on treadmill      no
2  89  No exercise induced ischaemia     yes
3  90    treadmill induced ischaemia      no
4  91       NORMAL test on treadmill      no
5  92  No exercise induced ischaemia     yes

If you want it to just be True/False, you can even skip np.where:

df['NegTest'] = df.TestResult.str.contains('No exercise induced ischaemia', case=True)

Output:

>>> df
   ID                     TestResult  NegTest
0  87  No exercise induced ischaemia     True
1  88       NORMAL test on treadmill    False
2  89  No exercise induced ischaemia     True
3  90    treadmill induced ischaemia    False
4  91       NORMAL test on treadmill    False
5  92  No exercise induced ischaemia     True
Sign up to request clarification or add additional context in comments.

1 Comment

Oh brilliant, I knew there had to be something missing!! Thanks :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.