1

Quite new to Python/Pandas and trying to perform an operation on my dataframe in Python 3 in iPython Notebook.

I have a df:

df=    Building ID   CorporationName  IndividualName
       1             Sample, LLC      John 
       1             n/a              Sam 
       1             n/a              Nancy 
       2             n/a              Tim
       2             n/a              Larry
       2             n/a              Paul 
       3             n/a              Rachel 
       4             Sample1, LLC     Dan 

And I'd like to create a new dataframe, taking only the rows that have 'n/a' as a value under CorporationName for all matching BuildingID values. Normally, this would be easy, but in this case, we have duplicate BuildingID values, even though the entire row is not a duplicate. So ideally, our output would look like this:

  no corp =     Building ID   CorporationName  IndividualName
                2              n/a              Tim
                2              n/a              Larry
                2              n/a              Paul 
                3              n/a              Rachel 

My first inclination was to do something like:

nocorp = ownercombo[ownercombo.CorporationName == 'n/a']

But obviously this will return rows for 'n/a' is true for some entries related to a BuildingID, not all.

To be honest, I really don't know how to do this. I searched everywhere, and the closest I could find was this post, which suggests using groupby. But, I realized if I do it this way it will just return four booleans:

In:    morethanone = ownercombo.groupby((ownercombo['BuildingID'].value_counts() > 1))
Out:                CorporationName

        BuildingID  
             False    True
             True     True

I'm clearly not anywhere near the right track, so any help pointing me in the right direction would be extremely appreciated!

1
  • what do you mean with for all matching BuildingID values? Commented Mar 17, 2016 at 19:00

2 Answers 2

1

You could use groupby/filter:

In [118]: df.groupby('Building ID').filter(lambda x: (x['CorporationName']=='n/a').all())
Out[118]: 
   Building ID CorporationName IndividualName
3            2             n/a            Tim
4            2             n/a          Larry
5            2             n/a           Paul
6            3             n/a         Rachel
Sign up to request clarification or add additional context in comments.

Comments

0

You can first find unique values of column CorporationName, which not (~) contains string n/a. Then you can filter DataFrame by mask with isin:

uni= ownercombo.loc[~ownercombo.CorporationName.str.contains('n/a'), 'Building ID'].unique()
print uni
[1 4]
print ownercombo[~ownercombo['Building ID'].isin(uni)]
   Building ID CorporationName IndividualName
3            2             n/a            Tim
4            2             n/a          Larry
5            2             n/a           Paul
6            3             n/a         Rachel

1 Comment

This also returned the exact same output. Thanks for your help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.