Drop rows within dataframe based on condition pandas python

Question

I have a dataframe like this

import pandas as pd
data = {'Index Title'  : ["Company1", "Company1", "Company2", "Company3"],
    'BusinessType'     : ['Type 1', 'Type 2', 'Type 1', 'Type 2'],
    'ID1'     : ['123', '456', '789', '012'] 
        }
df = pd.DataFrame(data)
df.index = df["Index Title"]
del df["Index Title"]
print(df)

Dataframe

where Index Title is a company name. For Company 1 I have two types - Type 1 and Type 2.

For Company 2 I have only Type 1 And for Company 3 I have only Type 2.

I would like to drop those rows where there is only one type - Type 1 or Type 2.

So in this case it should drop Company 2 and Company 3.

Could you please help me what is the best way to do that?

cs95 · Accepted Answer · 2020-05-31 20:11:32Z

2

For such problems we usually consider groupby and transform based filtering as it is pretty fast.

df[df.groupby(level=0)['BusinessType'].transform('nunique') > 1]

            BusinessType  ID1
Index Title                  
Company1          Type 1  123
Company1          Type 2  456

The first step is to determine the groups/rows which are associated with more than one type:

df.groupby(level=0)['BusinessType'].transform('nunique')

Index Title
Company1    2
Company1    2
Company2    1
Company3    1
Name: BusinessType, dtype: int64

From here, we remove all companies whose # unique types associated with are == 1.

answered May 31, 2020 at 20:11

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

PalimPalim Over a year ago

What's the difference between df.groupby('BusinessType').transform('nunique') & df.groupby(level=0)['BusinessType'].transform('nunique')?

cs95 Over a year ago

@PalimPalim Well, for one they're doing different things - level=... argument groups on a given level of the index. Well, level and axis together determine what you're actually grouping on, but axis=0 by default so it's usually the index.

yoskovia · Accepted Answer · 2020-05-31 21:15:36Z

1

This is one way: - you group by Index Title - filter if there is at least one Type 1 & one Type 2

df = (
    df.groupby('Index Title')
    .filter(lambda x: (x['BusinessType']=='Type 1').any() & 
                      (x['BusinessType']=='Type 2').any())
    .reset_index()
)

Update if you are looking for two or more types regardless if they are Type 1 & Type 2

df = (
    df.groupby('Index Title')
    .filter(lambda x: x['BusinessType'].nunique() > 1)
    .reset_index()
)

In this case @cs95's answer is the cleaner one, which you should use.

edited May 31, 2020 at 21:15

yoskovia

3602 silver badges10 bronze badges

answered May 31, 2020 at 20:09

PalimPalim

3,0681 gold badge22 silver badges46 bronze badges

1 Comment

cs95 Over a year ago

A couple of other points: 1) the way you build the condition with BusinessType is brittle - it means the code has to change if the contents of that column are any different. 2) I don't think this would work if there are more than two types.

Collectives™ on Stack Overflow

Drop rows within dataframe based on condition pandas python

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related