0

I have this DataFrame:

pd.DataFrame(
    {'name': ['Adam', 'Adam', 'Adam', 'Bill', 'Bill', 'Charlie', 'Charlie', 'Charlie', 'Charlie'],
     'message': ['start', 'stuck', 'finish', 'start', 'stuck', 'start', 'stuck', 'finish', 'finish']}
)

and I want to drop all rows with message "stuck" from all rows that don't have a message "finish":

pd.DataFrame(
    {'name': ['Adam', 'Adam', 'Bill', 'Bill', 'Charlie', 'Charlie', 'Charlie'],
     'message': ['start', 'finish', 'start', 'stuck', 'start', 'finish', 'finish']}
)

So Bill never "finished", so his message will remain "stuck".

2 Answers 2

1

To get if any student has finished, group by student and use any, here we want it back in the original shape of the dataframe so we use groupby.transform:

>>> sf = df['message'].eq('finish').groupby(df['name']).transform('any')
>>> sf
0     True
1     True
2     True
3    False
4    False
5     True
6     True
7     True
8     True
Name: message, dtype: bool

From there it’s easy to remove messages that are stuck from students that have not finished yet:

>>> df[~sf | df['message'].ne('stuck'))]
      name message
0     Adam   start
2     Adam  finish
3     Bill   start
4     Bill   stuck
5  Charlie   start
7  Charlie  finish
8  Charlie  finish
Sign up to request clarification or add additional context in comments.

Comments

1

This will work:

df[~((df.name.isin(df[df.message=="finish"]['name'])) & (df.message=='stuck'))]

Output:

name message
Adam start
Adam finish
Bill start
Bill stuck
Charlie start
Charlie finish
Charlie finish

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.