0

I have a dataframe as follows:

name value
aa 0
aa 0
aa 1
aa 0
aa 0
bb 0
bb 0
bb 1
bb 0
bb 0
bb 0

I want to delete all rows of the dataframe when there is 1 appeared in column 'value' with relation to 'name' column.

name value
aa 0
aa 0
aa 1
bb 0
bb 0
bb 1

What is the best way to do so? I thought about pd.groupby method and use some conditions inside, but cannot understand how to make it work.

6
  • 1
    So you want to delete all rows after the first 1 for each name? Commented Jul 26, 2021 at 12:20
  • Yes, that's right. Commented Jul 26, 2021 at 12:20
  • Are there some edge cases? Is it only 0 or 1? What have you tried? Commented Jul 26, 2021 at 12:21
  • No, there are no any other tricky cases. Only 0 and 1. I think it is possible to solve in a simple way without writing any search algorithm. Maybe iterating each row of a dataframe and looking for 1 then stop and move to another name. Commented Jul 26, 2021 at 12:25
  • How many names are in your dataset? Commented Jul 26, 2021 at 12:27

2 Answers 2

2

Not the most beautiful of ways to do it but this should work.

df = df.loc[df['value'].groupby(df['name']).cumsum().groupby(df['name']).cumsum() <=1]
Sign up to request clarification or add additional context in comments.

1 Comment

Tried you solution and it works. Thank you very much.
2

Here's my approach on solving this.

# Imports.
import pandas as pd

# Creating a DataFrame.
df = pd.DataFrame([{'name': 'aa', 'value': 0},
                   {'name': 'aa', 'value': 0},
                   {'name': 'aa', 'value': 1},
                   {'name': 'aa', 'value': 0},
                   {'name': 'aa', 'value': 0},
                   {'name': 'bb', 'value': 0},
                   {'name': 'bb', 'value': 0},
                   {'name': 'bb', 'value': 1},
                   {'name': 'bb', 'value': 0},
                   {'name': 'bb', 'value': 0},
                   {'name': 'bb', 'value': 0},
                   {'name': 'bb', 'value': 0}])
# Filtering the DataFrame.
df_filtered = df.groupby('name').apply(lambda x: x[x.index <= x['value'].idxmax()]).reset_index(drop=True)

1 Comment

This looks much more clean than what I came up with.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.