Using a Pandas loop to create a new dataframe based on combination of conditions in two columns

Question

Quite new to Python/Pandas and trying to perform an operation on my dataframe in Python 3 in iPython Notebook.

I have a df:

df=    Building ID   CorporationName  IndividualName
       1             Sample, LLC      John 
       1             n/a              Sam 
       1             n/a              Nancy 
       2             n/a              Tim
       2             n/a              Larry
       2             n/a              Paul 
       3             n/a              Rachel 
       4             Sample1, LLC     Dan

And I'd like to create a new dataframe, taking only the rows that have 'n/a' as a value under CorporationName for all matching BuildingID values. Normally, this would be easy, but in this case, we have duplicate BuildingID values, even though the entire row is not a duplicate. So ideally, our output would look like this:

  no corp =     Building ID   CorporationName  IndividualName
                2              n/a              Tim
                2              n/a              Larry
                2              n/a              Paul 
                3              n/a              Rachel

My first inclination was to do something like:

nocorp = ownercombo[ownercombo.CorporationName == 'n/a']

But obviously this will return rows for 'n/a' is true for some entries related to a BuildingID, not all.

To be honest, I really don't know how to do this. I searched everywhere, and the closest I could find was this post, which suggests using groupby. But, I realized if I do it this way it will just return four booleans:

In:    morethanone = ownercombo.groupby((ownercombo['BuildingID'].value_counts() > 1))
Out:                CorporationName

        BuildingID  
             False    True
             True     True

I'm clearly not anywhere near the right track, so any help pointing me in the right direction would be extremely appreciated!

what do you mean with for all matching BuildingID values?

MaxU - stand with Ukraine
– MaxU - stand with Ukraine

2016-03-17 19:00:00 +00:00
Commented Mar 17, 2016 at 19:00 — MaxU - stand with Ukraine
– MaxU - stand with Ukraine, Commented Mar 17, 2016 at 19:00

unutbu · Accepted Answer · 2016-03-17 19:02:36Z

1

You could use groupby/filter:

In [118]: df.groupby('Building ID').filter(lambda x: (x['CorporationName']=='n/a').all())
Out[118]: 
   Building ID CorporationName IndividualName
3            2             n/a            Tim
4            2             n/a          Larry
5            2             n/a           Paul
6            3             n/a         Rachel

answered Mar 17, 2016 at 19:02

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jezrael · Accepted Answer · 2016-03-17 18:59:59Z

0

You can first find unique values of column CorporationName, which not (~) contains string n/a. Then you can filter DataFrame by mask with isin:

uni= ownercombo.loc[~ownercombo.CorporationName.str.contains('n/a'), 'Building ID'].unique()
print uni
[1 4]
print ownercombo[~ownercombo['Building ID'].isin(uni)]
   Building ID CorporationName IndividualName
3            2             n/a            Tim
4            2             n/a          Larry
5            2             n/a           Paul
6            3             n/a         Rachel

answered Mar 17, 2016 at 18:59

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

1 Comment

Steven Over a year ago

This also returned the exact same output. Thanks for your help.

Collectives™ on Stack Overflow

Using a Pandas loop to create a new dataframe based on combination of conditions in two columns

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related