I have a dataframe that looks like this:
import pandas as pd
### create toy data set
data = [[1111,'10/1/2021',21,123],
[1111,'10/1/2021',-21,123],
[1111,'10/1/2021',21,123],
[2222,'10/2/2021',15,234],
[2222,'10/2/2021',15,234],
[3333,'10/3/2021',15,234],
[3333,'10/3/2021',15,234]]
df = pd.DataFrame(data,columns = ['Individual','date','number','cc'])
What I want to do is remove rows where Individual, date, and cc are the same, but number is a negative value in one case and a positive in the other case. For example, in the first three rows, I would remove rows 1 and 2 (because 21 and -21 values are equal in absolute terms), but I don't want to remove row 3 (because I have already accounted for the negative value in row 2 by eliminating row 1). Also, I don't want to remove duplicated values if the corresponding number values are positive. I have tried a variety of duplicated() approaches, but just can't get it right.
Expected results would be:
Individual date number cc
0 1111 10/1/2021 21 123
1 2222 10/2/2021 15 234
2 2222 10/2/2021 15 234
3 3333 10/3/2021 15 234
4 3333 10/3/2021 15 234
Thus, the first two rows are removed, but not the third row, since the negative value is already accounted for.
Any assistance would be appreciated. I am trying to do this without a loop, but it may be unavoidable. It seems similar to this question, but I can't figure out how to make it work in my case, as I am trying to avoid loops.