2

From a Pandas DataFrame like the following:

I want to apply a filter so it shows only the rows containing arrays with all elements within the range 10 > Muon_pt > 20 or some elements within the range 50 > Electron_pt > 100.

I do so by defining two functions:

def anyCut(x, minn , maxx):
    for i in x:
        if i > minn and i < maxx:
            return True
    return False

def allCut(x, minn, maxx):

    for i in x:
        if i < minn or i > maxx:
            return False    
    return True

And then, applying it:

minElectronPt = 50.0
maxElectronPt = 100.0

minMuonPt = 10
maxMuonPt = 20

df[
    (
        (df["nElectron"]>1)
        &
        (df["nMuon"]>1)
    )
    &
    (
        (df["Electron_charge"].apply(lambda x: all(x == -1)))
        &
        (
            (
                df["Electron_pt"].apply(lambda x: anyCut(x, minElectronPt, maxElectronPt))
            )

            |

            (
                df["Muon_pt"].apply(lambda x: allCut(x, minMuonPt, maxMuonPt))
            )
        )
    )
].head()

Getting:

Is there any way to apply this filter without looping through the nested arrays (i.e to replace anyCut and allCut functions)?

1 Answer 1

1

Here you can use Numpy arrays and avoid for loops, like:

import numpy as np

def anyCut(x, minn , maxx):
    x_np=np.array(x)
    if (x_np > minn).all() and (x_np < maxx).all()
        return True
    return False

def allCut(x, minn, maxx):
    x_np=np.array(x)
    if (x_np > minn).all() or (x_np < maxx).all()
        return False  
    return True
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.