0

I have a dataframe that looks like this:

import pandas as pd

### create toy data set
data = [[1111,'10/1/2021',21,123],
        [1111,'10/1/2021',-21,123],
        [1111,'10/1/2021',21,123],
        [2222,'10/2/2021',15,234],
        [2222,'10/2/2021',15,234],
        [3333,'10/3/2021',15,234],
        [3333,'10/3/2021',15,234]]

df = pd.DataFrame(data,columns = ['Individual','date','number','cc'])

What I want to do is remove rows where Individual, date, and cc are the same, but number is a negative value in one case and a positive in the other case. For example, in the first three rows, I would remove rows 1 and 2 (because 21 and -21 values are equal in absolute terms), but I don't want to remove row 3 (because I have already accounted for the negative value in row 2 by eliminating row 1). Also, I don't want to remove duplicated values if the corresponding number values are positive. I have tried a variety of duplicated() approaches, but just can't get it right.

Expected results would be:

  Individual       date  number   cc
0        1111  10/1/2021      21  123
1        2222  10/2/2021      15  234
2        2222  10/2/2021      15  234
3        3333  10/3/2021      15  234
4        3333  10/3/2021      15  234

Thus, the first two rows are removed, but not the third row, since the negative value is already accounted for.

Any assistance would be appreciated. I am trying to do this without a loop, but it may be unavoidable. It seems similar to this question, but I can't figure out how to make it work in my case, as I am trying to avoid loops.

2
  • Will the positive and negative values always be equal and zero out as in your example? And will other god rows ever be zero? Commented Oct 21, 2021 at 16:37
  • Yes, I have edited the question that the positive and negative values that would count as duplicates (and thus removed) would be equal to zero. Commented Oct 21, 2021 at 16:39

1 Answer 1

1

I can't be sure since you did not post your expected output, but you could try the below. Create a separate df called n that contains the rows with -ve 'number' and join it to the original with indicator=True.

n = df.loc[df.number.le(0)].drop('number',axis=1)
df = pd.merge(df,n,'left',indicator=True)

>>> df

   Individual       date  number   cc     _merge
0        1111  10/1/2021      21  123       both
1        1111  10/1/2021     -21  123       both
2        1111  10/1/2021      21  123       both
3        2222  10/2/2021      15  234  left_only
4        2222  10/2/2021      15  234  left_only
5        3333  10/3/2021      15  234  left_only
6        3333  10/3/2021      15  234  left_only

This will allow us to identify the Individual/date/cc groups that have a -ve 'number' row.


Then you can locate the rows with 'both' in _merge, and only use those to perform a groupby.head(2), concatenating that with the rest of the df:

out = pd.concat([df.loc[df._merge.eq('both')].groupby(['Individual','date','cc']).head(2),
           df.loc[df._merge.ne('both')]]).drop('_merge',axis=1)       

Which prints:

   Individual       date  number   cc
0        1111  10/1/2021      21  123
1        1111  10/1/2021     -21  123
3        2222  10/2/2021      15  234
4        2222  10/2/2021      15  234
5        3333  10/3/2021      15  234
6        3333  10/3/2021      15  234
Sign up to request clarification or add additional context in comments.

2 Comments

I added the expected output. What you have does not do it, but I see what you are doing and maybe can be edited to get the expected results.
if you change head(2) to head(1), gets your desired output. However, is that always going to be the case? Keeping the first two rows of the group that has a negative number? You will probably need to use sort_values with head to make sure it works. The data you provided is not enough to account for all the different scenarios however, which means that with head(1), you do get your desired outcome, but in your real data set it might not work. So please consider providing more information and more concrete examples.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.