Removing rows from pandas dataframe based on several columns

Question

From a pandas dataframe, I want to remove the "rois" where half or more of the rows have for any of the columns s, b1 or b2 a value of below 50.

Here an example dataframe:

roi s   b1  b2

4   40  60  70

4   60  40  80

4   80  70  60

5   60  40  60

5   60  60  60

5   60  60  60

Only the three rows corresponding to roi 5 should be left over (roi 4 has 2 out of 3 rows where at least one of the values of s, b1, b2 is below 50).

I have this implemented already, but wonder if there is a shorter (ie. faster and cleaner) way to do this:

for roi in data.roi.unique():
            subdata = data[data['roi']==roi];
            subdatas = subdata[subdata['s']>=50];
            subdatab1 = subdatas[subdatas['b1']>=50];
            subdatab2 = subdatab1[subdatab1['b2']>=50]
            if((subdatab2.size/10)/(subdata.size/10) < 0.5):
                data = data[data['roi']!=roi];

I copy & pasted from a csv and it automatically converted it to an image..? — TestGuest
– TestGuest, Commented Dec 11, 2019 at 6:42

Quang Hoang · Accepted Answer · 2019-12-11 07:00:03Z

2

You can do transform:

s = (data.set_index('roi')    # filter `roi` out of later comparison
         .lt(50).any(1)       # check > 50 on all columns
         .groupby('roi')      # groupby
         .transform('mean')   # compute the mean
         .lt(0.5)             # make sure mean > 0.5
         .values
    )

data[s]

Output:

   roi   s  b1  b2
3    5  60  40  60
4    5  60  60  60
5    5  60  60  60

edited Dec 11, 2019 at 7:00

answered Dec 11, 2019 at 6:41

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

TestGuest Over a year ago

Thank you - where is the criterion to have s/b1/b2 be below 50 for removal?

Quang Hoang Over a year ago

@jezrael that includes roi into all(1) does it?

oppressionslayer Over a year ago

This changes the roi to 6 on one of the values

Quang Hoang Over a year ago

@TestGuest see updated answer, now show expected output.

Paul Lo · Accepted Answer · 2019-12-11 06:34:50Z

1

You can use multiple filter conditions all at once to avoid creating intermediate data frames (space complexity efficiency), example:

for roi in data.roi.unique():
  subdata2 = data[(data['roi']==roi) &
                  (data['s']>=50) &
                  (data['b2']>=50)]
  if (subdata2.size/10)/(data[data['roi']==roi].size/10) < 0.5:
      data = data[data['roi']!=roi]

answered Dec 11, 2019 at 6:34

Paul Lo

6,1466 gold badges33 silver badges37 bronze badges

Collectives™ on Stack Overflow

Removing rows from pandas dataframe based on several columns

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related