19

In my dataframe a column is made up of lists, for example:

df = pd.DataFrame({'A':[[1,2],[2,4],[3,1]]})

I need to find out the location of list [1,2] in this dataframe. I tried:

df.loc[df['A'] == [1,2]]

and

df.loc[df['A'] == [[1,2]]]

but failed totally. The comparison seems very simple but that just doesn't work. Am I missing something here?

5
  • The only thing you're "missing" is that data frames aren't really great for storing lists. Any reason you don't want two separate columns? Commented Nov 1, 2018 at 21:18
  • @BallpointBen Thanks for your attention, I've posted a new question to explain the whole question. stackoverflow.com/questions/53115592/… Commented Nov 2, 2018 at 9:11
  • @Luuklag This may be a duplicate, but I don't believe it's a duplicate of the target you suggest. That one seems to be trying to filter based on whether multiple columns are equal to particular values. This one is trying to check if the list is equal to a single column's value, which has a very different answer. Commented Nov 13, 2018 at 22:19
  • Feel free to suggest a more appropriate target. Commented Nov 13, 2018 at 23:05
  • @Luuklag, I posted the two questions because I don't think they are the same. As jpmc described, they are connected but also very different. This post is actually the varietas of that one: I tried stupid things to solve that one and based on the stupid thing I posted this one. But this one still has its distinct value. Can you please remove the duplicate target? Commented Nov 19, 2018 at 2:56

5 Answers 5

20

Do not use list in cell, it creates a lot of problem for pandas. If you do need an object column, using tuple:

df.A.map(tuple).isin([(1,2)])
Out[293]: 
0     True
1    False
2    False
Name: A, dtype: bool
#df[df.A.map(tuple).isin([(1,2)])]
Sign up to request clarification or add additional context in comments.

1 Comment

Could you explain why a tuple is better? I've noticed that Pandas struggles when lists are in a cell, but does it handle tuples better because they are immutable? Would you expect a numpy array to work better than a list?
15

You can use apply and compare as:

df['A'].apply(lambda x: x==[1,2])

0     True
1    False
2    False
Name: A, dtype: bool

print(df[df['A'].apply(lambda x: x==[1,2])])

        A
0  [1, 2]

Comments

9

With Numpy arrays

df.assign(B=(np.array(df.A.tolist()) == [1, 2]).all(1))

        A      B
0  [1, 2]   True
1  [2, 4]  False
2  [3, 1]  False

4 Comments

This should be the accepted solution! [Or, if possible, just expanding the series of lists to 2 series.]
Won't this run into issues if the lists are differently sized, though perhaps that's outside of the scope of this example.
@ALollz yes and yes
Nice! My only concern is, this solution converts datatype twice, what if my dataframe is very big, will this conversion cost more time?
6

Using numpy

df.A.apply(lambda x: (np.array(x) == np.array([1,2])).all())

0     True
1    False
2    False

Comments

0

Or:

df['A'].apply(([1,2]).__eq__)

Then:

df[df['A'].apply(([1,2]).__eq__)]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.