1

I have a dataframe where I take a subset of columns and then want to filter out the rows that conditionally match two criterias.

Heres what the dataframe looks like:

Name     Err1    Err2    Page 
Amazon   404     201     Shopping
Facebook 202             Social
Goku                     Shopping
Ross             203     Shopping

I replace the nulls with say '-' group the data with Err1 and Err2, and also get the unique count of Err1.

    df['err1'].fillna("-", inplace=True)
    df['err2'].fillna("-", inplace=True)
    df.groupby(["Name","Err1", "Err2"]).agg({"Err1": "count"})

This gives me:

Name     Err1    Err2    Err1 
Amazon   404     201     1
Facebook 202      -      1
Goku      -       -      1
Ross      -      203     1

a) I would like to remove all rows that have both "Err1" and "Err2" == "-" and display rows only if either Err1 or Err2 are not '-'.
b) In the above, how can I get the unique count of both Err1 and Err2 combined, instead of the unique of just Err1?

I dont want to use for loops and iterate through the data as the dataset is over 100k lines. Is there an efficient way to achieve this?

5
  • 1
    Why do you fill the nulls with -. It would be easier if you left them as is. Commented Aug 4, 2019 at 21:09
  • The groupby seems to be ignoring fields with null values. Correct me if I am doing something wrong. Thanks Commented Aug 4, 2019 at 21:18
  • 1
    > I would like to remove all rows that have both "Err1" and "Err2" == "-" and display rows only if either Err1 or Err2 are not '-'.; don't replace NaN' values with - and just df.dropna(how='all', subset=['Err1', 'Err2']) Commented Aug 4, 2019 at 21:31
  • 1
    df = df.loc[~((df['Err1'] == "-") & (df['Err2'] == "-")), :] if you just have to turn them into dashes Commented Aug 4, 2019 at 21:33
  • Sweet! Thanks guys! Thats good learning for me. Commented Aug 4, 2019 at 22:29

1 Answer 1

4

Here is one way first you need to dropna when Errs are all null

df=df[df[['Err1','Err2']].isnull().all(1)].copy()

About the unique count , when you groupby with Err1 and Err2, it already getting the count by both of them

df.fillna('NaN').groupby(["Name","Err1", "Err2"]).size()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.