2

I have a pandas dataframe data that looks like this

    MED1    MED2    MED3    MED4    MED5
0   60735   24355   33843   16475   9995
1   10126   5789    17165   90000   90000
2   5789    19675   30553   90000   90000
3   60735   17865   34495   90000   90000
4   19675   5810    90000   90000   90000

​I want to create a new bool column "med" that has True/False based on ​60735 in the columns MED1...MED5 I am trying this and am not sure how to make it work.

DF['med'] = (60735 in [DF['MED1'], DF['MED2']])

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

MED1..MED5 represent drugs being taken by a patient at a hospital visit. I have a list of about 20 drugs for which I need to know if the patien was taking them. Each drug is coded with a number but has a name. A nice solution would look something like (below) but how do I do this with pandas.

drugs = {'drug1':60735, 'drug2':5789}  
for n in drugs.keys():
    DF[n] = drugs[n] in DF[['MED1', 'MED2', 'MED3', 'MED4', 'MED5']]
3
  • Sorry, I am confused. You want to have True when MED1 = 60735 and False otherwise? Commented Jul 29, 2014 at 22:02
  • oops, should be MED1 and MED2, fixed now. Commented Jul 29, 2014 at 22:12
  • just in the first two columns? Commented Jul 29, 2014 at 22:56

3 Answers 3

4

@Mai's answer will of course work - it may be a bit more standard to write it like this, with the | operator.

df['med'] = (df['MED1'] == 60735) | (df['MED1'] == 60735)

If you want to check for a value in all (or many) columns, you could also use isin as below. The isin checks whether the value in the list is in each cell, and the any(1) returns True if any element in each row is True.

df['med'] = df.isin([60735]).any(1)

Edit: Based on your edited question, would this work?

for n in drugs:
    df[n] = df[['MED1','MED2','MED3','MED4','MED5']].isin([drugs[n]]).any(1)
Sign up to request clarification or add additional context in comments.

1 Comment

Is there no way to slice the row based on column names and get the values? I have added to the question to better explain what I am trying to do.
0

I am still confused. But part of what you want may be this:

import numpy as np
DF['med'] = np.logical_or(DF['MED1'] == 60735, DF['MED2'] == 60735)

Comments

0

Here are a few %timeit comparisons of some methods to return bools from a dataframe column.

In [2]: %timeit df['med'] = [bool(x) if int(60735) in x else False for x in enumerate(df['MED1'])]
1000 loops, best of 3: 379 µs per loop

In [3]: %timeit df['med'] = (df['MED1'] == 60735)
1000 loops, best of 3: 649 µs per loop

In [4]: %timeit df['med'] = df['MED1'].isin([60735])
1000 loops, best of 3: 404 µs per loop

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.