1

From a pandas dataframe, I want to remove the "rois" where half or more of the rows have for any of the columns s, b1 or b2 a value of below 50.

Here an example dataframe:

roi s   b1  b2

4   40  60  70

4   60  40  80

4   80  70  60

5   60  40  60

5   60  60  60

5   60  60  60

Only the three rows corresponding to roi 5 should be left over (roi 4 has 2 out of 3 rows where at least one of the values of s, b1, b2 is below 50).

I have this implemented already, but wonder if there is a shorter (ie. faster and cleaner) way to do this:

for roi in data.roi.unique():
            subdata = data[data['roi']==roi];
            subdatas = subdata[subdata['s']>=50];
            subdatab1 = subdatas[subdatas['b1']>=50];
            subdatab2 = subdatab1[subdatab1['b2']>=50]
            if((subdatab2.size/10)/(subdata.size/10) < 0.5):
                data = data[data['roi']!=roi];
5
  • Please provide your data as text. Commented Dec 11, 2019 at 6:37
  • I copy & pasted from a csv and it automatically converted it to an image..? Commented Dec 11, 2019 at 6:42
  • 1
    print data to the console, and copy/paste here. Commented Dec 11, 2019 at 6:42
  • So only rows 0,1 and 3? Commented Dec 11, 2019 at 6:49
  • No only the last 3 rows Commented Dec 11, 2019 at 6:55

2 Answers 2

2

You can do transform:

s = (data.set_index('roi')    # filter `roi` out of later comparison
         .lt(50).any(1)       # check > 50 on all columns
         .groupby('roi')      # groupby
         .transform('mean')   # compute the mean
         .lt(0.5)             # make sure mean > 0.5
         .values
    )

data[s]

Output:

   roi   s  b1  b2
3    5  60  40  60
4    5  60  60  60
5    5  60  60  60
Sign up to request clarification or add additional context in comments.

4 Comments

Thank you - where is the criterion to have s/b1/b2 be below 50 for removal?
@jezrael that includes roi into all(1) does it?
This changes the roi to 6 on one of the values
@TestGuest see updated answer, now show expected output.
1

You can use multiple filter conditions all at once to avoid creating intermediate data frames (space complexity efficiency), example:

for roi in data.roi.unique():
  subdata2 = data[(data['roi']==roi) &
                  (data['s']>=50) &
                  (data['b2']>=50)]
  if (subdata2.size/10)/(data[data['roi']==roi].size/10) < 0.5:
      data = data[data['roi']!=roi]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.