Delete some rows in dataframe based on condition in another column

Question

I have a dataframe as follows:

name	value
aa	0
aa	0
aa	1
aa	0
aa	0
bb	0
bb	0
bb	1
bb	0
bb	0
bb	0

I want to delete all rows of the dataframe when there is 1 appeared in column 'value' with relation to 'name' column.

name	value
aa	0
aa	0
aa	1
bb	0
bb	0
bb	1

What is the best way to do so? I thought about pd.groupby method and use some conditions inside, but cannot understand how to make it work.

So you want to delete all rows after the first 1 for each name? — Tom S
– Tom S, Commented Jul 26, 2021 at 12:20
Are there some edge cases? Is it only 0 or 1? What have you tried? — Tom S
– Tom S, Commented Jul 26, 2021 at 12:21
No, there are no any other tricky cases. Only 0 and 1. I think it is possible to solve in a simple way without writing any search algorithm. Maybe iterating each row of a dataframe and looking for 1 then stop and move to another name. — Roman Lents
– Roman Lents, Commented Jul 26, 2021 at 12:25

Tom S · Accepted Answer · 2021-07-26 12:38:17Z

2

Not the most beautiful of ways to do it but this should work.

df = df.loc[df['value'].groupby(df['name']).cumsum().groupby(df['name']).cumsum() <=1]

answered Jul 26, 2021 at 12:38

Tom S

6311 gold badge8 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Roman Lents Over a year ago

Tried you solution and it works. Thank you very much.

George · Accepted Answer · 2021-07-26 12:47:47Z

2

Here's my approach on solving this.

# Imports.
import pandas as pd

# Creating a DataFrame.
df = pd.DataFrame([{'name': 'aa', 'value': 0},
                   {'name': 'aa', 'value': 0},
                   {'name': 'aa', 'value': 1},
                   {'name': 'aa', 'value': 0},
                   {'name': 'aa', 'value': 0},
                   {'name': 'bb', 'value': 0},
                   {'name': 'bb', 'value': 0},
                   {'name': 'bb', 'value': 1},
                   {'name': 'bb', 'value': 0},
                   {'name': 'bb', 'value': 0},
                   {'name': 'bb', 'value': 0},
                   {'name': 'bb', 'value': 0}])
# Filtering the DataFrame.
df_filtered = df.groupby('name').apply(lambda x: x[x.index <= x['value'].idxmax()]).reset_index(drop=True)

answered Jul 26, 2021 at 12:47

George

15111 bronze badges

1 Comment

Tom S Over a year ago

This looks much more clean than what I came up with.

Collectives™ on Stack Overflow

Delete some rows in dataframe based on condition in another column

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related