2

I have a dataframe like this

import pandas as pd
data = {'Index Title'  : ["Company1", "Company1", "Company2", "Company3"],
    'BusinessType'     : ['Type 1', 'Type 2', 'Type 1', 'Type 2'],
    'ID1'     : ['123', '456', '789', '012'] 
        }
df = pd.DataFrame(data)
df.index = df["Index Title"]
del df["Index Title"]
print(df)

Dataframe

where Index Title is a company name. For Company 1 I have two types - Type 1 and Type 2.


For Company 2 I have only Type 1 And for Company 3 I have only Type 2.

I would like to drop those rows where there is only one type - Type 1 or Type 2.

So in this case it should drop Company 2 and Company 3.

Could you please help me what is the best way to do that?

2 Answers 2

2

For such problems we usually consider groupby and transform based filtering as it is pretty fast.

df[df.groupby(level=0)['BusinessType'].transform('nunique') > 1]

            BusinessType  ID1
Index Title                  
Company1          Type 1  123
Company1          Type 2  456

The first step is to determine the groups/rows which are associated with more than one type:

df.groupby(level=0)['BusinessType'].transform('nunique')

Index Title
Company1    2
Company1    2
Company2    1
Company3    1
Name: BusinessType, dtype: int64

From here, we remove all companies whose # unique types associated with are == 1.

Sign up to request clarification or add additional context in comments.

2 Comments

What's the difference between df.groupby('BusinessType').transform('nunique') & df.groupby(level=0)['BusinessType'].transform('nunique')?
@PalimPalim Well, for one they're doing different things - level=... argument groups on a given level of the index. Well, level and axis together determine what you're actually grouping on, but axis=0 by default so it's usually the index.
1

This is one way: - you group by Index Title - filter if there is at least one Type 1 & one Type 2

df = (
    df.groupby('Index Title')
    .filter(lambda x: (x['BusinessType']=='Type 1').any() & 
                      (x['BusinessType']=='Type 2').any())
    .reset_index()
)

Update if you are looking for two or more types regardless if they are Type 1 & Type 2

df = (
    df.groupby('Index Title')
    .filter(lambda x: x['BusinessType'].nunique() > 1)
    .reset_index()
)

In this case @cs95's answer is the cleaner one, which you should use.

1 Comment

A couple of other points: 1) the way you build the condition with BusinessType is brittle - it means the code has to change if the contents of that column are any different. 2) I don't think this would work if there are more than two types.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.