I have a dataframe where I take a subset of columns and then want to filter out the rows that conditionally match two criterias.
Heres what the dataframe looks like:
Name Err1 Err2 Page
Amazon 404 201 Shopping
Facebook 202 Social
Goku Shopping
Ross 203 Shopping
I replace the nulls with say '-' group the data with Err1 and Err2, and also get the unique count of Err1.
df['err1'].fillna("-", inplace=True)
df['err2'].fillna("-", inplace=True)
df.groupby(["Name","Err1", "Err2"]).agg({"Err1": "count"})
This gives me:
Name Err1 Err2 Err1
Amazon 404 201 1
Facebook 202 - 1
Goku - - 1
Ross - 203 1
a) I would like to remove all rows that have both "Err1" and "Err2" == "-" and display rows only if either Err1 or Err2 are not '-'.
b) In the above, how can I get the unique count of both Err1 and Err2 combined, instead of the unique of just Err1?
I dont want to use for loops and iterate through the data as the dataset is over 100k lines. Is there an efficient way to achieve this?
-. It would be easier if you left them as is.NaN' values with-and justdf.dropna(how='all', subset=['Err1', 'Err2'])df = df.loc[~((df['Err1'] == "-") & (df['Err2'] == "-")), :]if you just have to turn them into dashes