Pandas - replace row with null values and drop rows matching two conditions

Question

I have a dataframe where I take a subset of columns and then want to filter out the rows that conditionally match two criterias.

Heres what the dataframe looks like:

Name     Err1    Err2    Page 
Amazon   404     201     Shopping
Facebook 202             Social
Goku                     Shopping
Ross             203     Shopping

I replace the nulls with say '-' group the data with Err1 and Err2, and also get the unique count of Err1.

    df['err1'].fillna("-", inplace=True)
    df['err2'].fillna("-", inplace=True)
    df.groupby(["Name","Err1", "Err2"]).agg({"Err1": "count"})

This gives me:

Name     Err1    Err2    Err1 
Amazon   404     201     1
Facebook 202      -      1
Goku      -       -      1
Ross      -      203     1

a) I would like to remove all rows that have both "Err1" and "Err2" == "-" and display rows only if either Err1 or Err2 are not '-'.
b) In the above, how can I get the unique count of both Err1 and Err2 combined, instead of the unique of just Err1?

I dont want to use for loops and iterate through the data as the dataset is over 100k lines. Is there an efficient way to achieve this?

Why do you fill the nulls with -. It would be easier if you left them as is. — user3483203
– user3483203, Commented Aug 4, 2019 at 21:09
The groupby seems to be ignoring fields with null values. Correct me if I am doing something wrong. Thanks — sidman
– sidman, Commented Aug 4, 2019 at 21:18
> I would like to remove all rows that have both "Err1" and "Err2" == "-" and display rows only if either Err1 or Err2 are not '-'.; don't replace NaN' values with - and just df.dropna(how='all', subset=['Err1', 'Err2']) — Hryhorii Pavlenko
– Hryhorii Pavlenko, Commented Aug 4, 2019 at 21:31
df = df.loc[~((df['Err1'] == "-") & (df['Err2'] == "-")), :] if you just have to turn them into dashes — CJR
– CJR, Commented Aug 4, 2019 at 21:33

BENY · Accepted Answer · 2019-08-04 21:51:51Z

4

Here is one way first you need to dropna when Errs are all null

df=df[df[['Err1','Err2']].isnull().all(1)].copy()

About the unique count , when you groupby with Err1 and Err2, it already getting the count by both of them

df.fillna('NaN').groupby(["Name","Err1", "Err2"]).size()

answered Aug 4, 2019 at 21:51

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Pandas - replace row with null values and drop rows matching two conditions

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related