0

I have a simple dataset, but I need to extract a sub-dateset under multiple conditions (by order):

df = pd.DataFrame({'animal': ['cat','cat','cat','dog','bird','bird'], 'place': ['A','B','C','A','B','C',]})

enter image description here

  1. cat or dog has to be located at least two places, if not, delete the rows where cat or dog appears once.

The output:

enter image description here

  1. cat or dog has to be in A place, if not, delete the rows. For example, if cat only stays in B or C, delete all rows of cat, but if cat stays A, and (B or C) which means A,B, A,C, or A,B,C, keep all cat rows.

The final output:

enter image description here

I am wondering if there is an efficient way to deal with. Thank you so much.

2
  • is the list of animals sorted as in the example or no? this plays a big role in how a solution can be mocked up. Commented Mar 4, 2021 at 3:07
  • It is not sorted, cat, dog, and bird can be in any order. Thank you Commented Mar 4, 2021 at 3:10

1 Answer 1

2

Logic for first condition

logic1 = df['animal'].value_counts().loc[['cat', 'dog']] > 2

apply it to the df

df = df[df['animal'].map(logic1).fillna(True)]

This is one approach for logic2

logic2cat = d1[d1['animal'].eq('cat') & d1['place'].eq('A')].empty
logic2dog = d1[d1['animal'].eq('dog') & d1['place'].eq('A')].empty


if logic2cat:
    df = df[df['animal'].ne('cat')]
elif logic2dog:
    df = df[df['animal'].ne('dog')]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.