0

This question has been asked but I didn't find the answers complete. I have a dataframe that has unnecessary values in the first row and I want to find the row index of the animals:

df = pd.DataFrame({'a':['apple','rhino','gray','horn'],
                   'b':['honey','elephant', 'gray','trunk'],
                   'c':['cheese','lion', 'beige','mane']})

       a         b       c
0  apple     honey  cheese
1  rhino  elephant    lion
2   gray      gray   beige
3   horn     trunk    mane

ani_pat = r"rhino|zebra|lion"

That means I want to find "1" - the row index that matches the pattern. One solution I saw here was like this; applying to my problem...

def findIdx(df, pattern):
    return df.apply(lambda x: x.str.match(pattern, flags=re.IGNORECASE)).values.nonzero()

animal = findIdx(df, ani_pat)
print(animal)
(array([1, 1], dtype=int64), array([0, 2], dtype=int64))

That output is a tuple of NumPy arrays. I've got the basics of NumPy and Pandas, but I'm not sure what to do with this or how it relates to the df above.

I altered that lambda expression like this:

df.apply(lambda x: x.str.match(ani_pat, flags=re.IGNORECASE))

       a      b      c
0  False  False  False
1   True  False   True
2  False  False  False
3  False  False  False

That makes a little more sense. but still trying to get the row index of the True values. How can I do that?

1 Answer 1

1

We can select from the filter the DataFrame index where there are rows that have any True value in them:

idx = df.index[
    df.apply(lambda x: x.str.match(ani_pat, flags=re.IGNORECASE)).any(axis=1)
]

idx:

Int64Index([1], dtype='int64')

any on axis 1 will take the boolean DataFrame and reduce it to a single dimension based on the contents of the rows.

Before any:

       a      b      c
0  False  False  False
1   True  False   True
2  False  False  False
3  False  False  False

After any:

0    False
1     True
2    False
3    False
dtype: bool

We can then use these boolean values as a mask for index (selecting indexes which have a True value):

Int64Index([1], dtype='int64')

If needed we can use tolist to get a list instead:

idx = df.index[
    df.apply(lambda x: x.str.match(ani_pat, flags=re.IGNORECASE)).any(axis=1)
].tolist()

idx:

[1]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.