0

So I have a function that sets a value in a column of a dataframe based on whether or not some string in the dataframe contains values from a list. I then want to get a count of how many rows in the dataframe have that value, but I am getting an error.

If certain conditions are met, the 'tag' column is being set equal to a list, ['date','must',glucose']. Not all of the rows meet the condition for this to happen. I want to find the number of rows where this IS being met,by analyzing the dataframe.

I have tried this:

df = data[data['tag'] == ['date','must','glucose']]
print(df)

...but that yields:

ValueError: Lengths must match to compare

I also tried this but that yields the same error:

df = data.tag == ['date','must','glucose']

If I was just comparing values, that would work, but having a list in the cell instead of a value is blowing it up. Like if the value was just 'four' and I was doing this, it wouldn't give me an error:

df = data[data.tag=='four']

Is there a way to accomplish this? Thank you!

1
  • can you paste a sample of data? Commented Sep 10, 2019 at 17:40

2 Answers 2

2
You can use apply function for it.  

df = df[df['tag'].apply(lambda x : x == ['date','must','glucose'])]

you can also convert it into tuple and compare

source: Pandas: compare list objects in Series

Sign up to request clarification or add additional context in comments.

1 Comment

This worked perfectly, thank you! Nice and simple, and replicable for similar situations. Much appreciated.
0

EDITING ANSWER

You need to use isin() to accomplish that. Consider:

>>> data = pd.DataFrame({'sample col1': [1,2,3,4,5], 'sample col2': ['a','b','c','d','e'], 'tag': ['some text', 'some value','date','must','glucose']})

>>> data
   sample col1 sample col2         tag
0            1           a   some text
1            2           b  some value
2            3           c        date
3            4           d        must
4            5           e     glucose
>>> df = data[~data['tag'].isin(['date','must','glucose'])]
>>> df
   sample col1 sample col2         tag
0            1           a   some text
1            2           b  some value

On your case:

>>> df.reset_index(inplace = True, drop =True)
>>> df['map'] = 'True'

>>> df
   sample col1 sample col2         tag   map
0            1           a   some text  True
1            2           b  some value  True

>>> map_dict = dict(zip(df['tag'], df['map']))
>>> data['In your list?'] = data['tag'].map(map_dict).fillna(value = 'False')

>>> data
   sample col1 sample col2         tag Not in your list?
0            1           a   some text              True
1            2           b  some value              True
2            3           c        date             False
3            4           d        must             False
4            5           e     glucose             False

Hope this helps :D

2 Comments

Thanks for this, but I think there is a misunderstanding. In the 'tag' column, for some rows, the value is ['date','must','glucose']. I am looking to find those rows. The answer provided from vbrises worked for this purpose.
I see you want not in logic. Just simply place ~ or tilde sign. I'll update my answer.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.