1

I have a dataset that I need to filter once a value has been exceeded but not after. Here is an example of the dataframe:

    Dip    MD
0   70      5000
1   80      6000
2   90      7000
3   80      8000

I want to filter out everything before Dip goes above 85 the first time so the resultant array should look like this:

    Dip     MD
0   90      7000
1   80      8000

2 Answers 2

1

Maybe using cummax

In [71]: df = pd.DataFrame({'Dip': [70, 80, 90, 80], 
    ...:     'MD': [5000, 6000, 7000, 8000]})         

In [72]: df[df.Dip.gt(85).cummax()]                   
Out[72]: 
   Dip    MD
2   90  7000
3   80  8000
Sign up to request clarification or add additional context in comments.

Comments

0

You can first find the positional index of the first value satisfying a condition:

idx = next(iter(np.where(df['Dip'] > 85)[0]), df.shape[0])

Then slice your dataframe by integer position from this value onwards:

res = df.iloc[idx:]

Choosing df.shape[0] as the default if your condition is never satisfied ensures the entire dataframe is returned in this scenario.

Performance note

For larger data sets, you may find integer indexing more efficient than Boolean indexing:

np.random.seed(0)

df = pd.DataFrame({'A': np.random.randint(0, 100, 10**6)})

%timeit df[df['A'].gt(90).cummax()]                                   # 36.1 ms
%timeit df.iloc[next(iter(np.where(df['A'] > 90)[0]), df.shape[0]):]  # 4.04 ms

If efficiency is a primary concern, see Efficiently return the index of the first value satisfying condition in array. The idea is you don't have to traverse your entire series if the condition is satisfied earlier.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.