I have a pandas dataframe and I am trying to remove duplicate rows if the LE column is "AAA". If there is an "AAA" but no other rows with same ID/Name, then I want to leave the row(s) alone.
What I have
import pandas as pd
df = pd.DataFrame({'ID': [111, 222, 222, 333, 333, 444, 444, 444, 555, 555, 555, 555],
'Name': ['David','Carl','Carl','Jane','Jane','Mike','Mike','Mike','Jake','Jake','Jake','Jake'],
'LE': ['AAA','AAA','BBB','BBB','CCC','AAA','BBB','CCC','AAA','BBB','CCC','DDD']})
print(df)
ID Name LE
0 111 David AAA
1 222 Carl AAA
2 222 Carl BBB
3 333 Jane BBB
4 333 Jane CCC
5 444 Mike AAA
6 444 Mike BBB
7 444 Mike CCC
8 555 Jake AAA
9 555 Jake BBB
10 555 Jake CCC
11 555 Jake DDD
What I want
ID Name LE
0 111 David AAA
1 222 Carl BBB
2 333 Jane BBB
3 333 Jane CCC
4 444 Mike BBB
5 444 Mike CCC
6 555 Jake BBB
7 555 Jake CCC
8 555 Jake DDD
In this case, the row with "David" is left alone as there are no other instances of "David."
The row with "Jane" is left alone as there are no instances of "AAA" under the LE column.
For the rest, all instances with "AAA" under the LE column is deleted as there are duplicate data in the other two columns.
I tried using drop_duplicates() but it doesn't work due to the fact that I can only keep one of the duplicate rows if I utilize this functionality. But in this case, I want to delete only one specific row per duplicate.
tl;dr Delete duplicate rows only if the LE column has the value 'AAA'