Select rows of pandas dataframe based on string in nested list

Question

How can I select a subset of a pandas dataframe based on the condition if a column which is a nested list contains a given string.

import pandas as pd

df = pd.DataFrame({'id': [12, 34, 43], 'course': ['Mathematics', 'Sport', 'Biology'], 'students': [['John Doe', 'Peter Parker', 'Lois Lane'], ['Bruce Banner', 'Lois Lane'], ['John Doe', 'Bruce Banner']]})

And now I would like to select all rows in which John Doe is in the students.

the_pr0blem · Accepted Answer · 2022-08-02 20:38:43Z

1

df[df.students.apply(lambda row: "John Doe" in row)]

answered Aug 2, 2022 at 20:38

the_pr0blem

3111 silver badge6 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

sitting_duck · Accepted Answer · 2022-08-02 21:25:28Z

0

Here is a vectorized option:

df[(df['students'].explode() == 'John Doe').groupby(level=0).any()]

answered Aug 2, 2022 at 21:25

sitting_duck

3,7801 gold badge17 silver badges20 bronze badges

2 Comments

SomeDude Over a year ago

This is not performant compared to apply. The explode and followed by groupby consumes time.

sitting_duck Over a year ago

@SomeDude Yep - you are correct. Just timed it myself.

SomeDude · Accepted Answer · 2022-08-02 21:52:14Z

0

You can use str methods (first join list to ',' separated values and then look for 'John Doe'):

df[df['students'].str.join(',').str.match('John Doe')]

But actually the apply method can be more performant.

The timeit for a bigger dataframe containing 27 rows(repeated the original df):

%timeit df[df['students'].str.join(',').str.match('John Doe')]
382 µs ± 10.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df[df.students.apply(lambda row: "John Doe" in row)]
271 µs ± 12.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Output:

   id       course                             students
0  12  Mathematics  [John Doe, Peter Parker, Lois Lane]
2  43      Biology             [John Doe, Bruce Banner]

edited Aug 2, 2022 at 21:52

answered Aug 2, 2022 at 21:38

SomeDude

14.3k5 gold badges26 silver badges49 bronze badges

Collectives™ on Stack Overflow

Select rows of pandas dataframe based on string in nested list

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related