2

I want to filter a dataframe in which column having list of values. I want to filter on multiple conditions. How can I do that?

>>> my_df
        col   values
0         c1  [1,2,3]
1         c2  ['a', 'b', 'c']
2         c3  [11,12,13]


>>> my_df.query(df.query(" `c2` in '['a','b','c']' "))

I expect the output to be

       col   values
1         c2  ['a', 'b', 'c']
4
  • my_df represents a DataFrame which consits of lists of distinct values? And you want to filter for exact match? Why do you want to use 'in' at all? Commented Jan 2, 2023 at 9:30
  • @Jan Even i tried using ==. Still i'm not getting expected output Commented Jan 2, 2023 at 9:53
  • Interesting behaviour. I will keep that in mind. Commented Jan 2, 2023 at 10:04
  • Indeed @Jan : at 1st I thought this question was a no-brainer, then I discovered it had unsuspected depths! I found this interesting article on the subject: towardsdatascience.com/… Commented Jan 2, 2023 at 10:07

2 Answers 2

1

This is another work around.

def list_equals(lst, element):
    return lst == element
    
my_df["equals"] = my_df["values"].apply(list_equals, element=['a', 'b', 'c'])

filtered_df = my_df.query("equals == True")
filtered_df = filtered_df.drop("equals", axis=1)

print(filtered_df)

Output:

   col     values
1   c2  [a, b, c]

To produce the values output as list of strings you can use the apply() function one more time.

filtered_df["values"] = filtered_df["values"].apply(str)
print(filtered_df)

Output:

  col           values
1  c2  ['a', 'b', 'c']
Sign up to request clarification or add additional context in comments.

1 Comment

Nice; it seems the original problem comes from the way pandas stores lists; based on your approach, I tried df['clone'] = [[1,2,3]]*3 then df[df['values'] == df['clone']] which returned the 1st row as expected.
1

You should convert the lists to tuple

print(my_df[my_df['values'].apply(tuple) == ('a', 'b', 'c')])

To use query change values column to tuple first

my_df['values'] = my_df['values'].apply(tuple)
t = ('a', 'b', 'c')
print(my_df.query('values == @t'))

Output:

  col     values
1  c2  [a, b, c]

3 Comments

Nice one; I made several attempt to filter on actual lists, or stringified lists, but all failed.
I tried with strings, based on your method: at last this one worked (note the spaces after commas, and the mandatory single quotes inside...): df['values'].apply(str) == "['a', 'b', 'c']"
@Guy Thanks for your response. Though i need to filter dataframe using query().

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.