2

I am trying to remove rows from a pandas DataFrame with multiple criteria.

The criteria is if Trader_x buys and sells on the same date, remove both the buy and sell rows for that date.

This is complex because it requires the trader and date to be unique for the lookup. I do not want to remove rows if only the date matches buy and sell. It has to be trader_x and date dependant.

I have tried many methods including pivot tables and merges but I cannot get anything close to working.

The input data looks like this:

Trader_ID   Date    Type
Trader_1    5/4/11  Sell
Trader_1    13/4/11 Sell
Trader_1    4/4/12  Sell
Trader_1    3/4/13  Sell
Trader_1    2/5/13  Sell
Trader_1    9/7/13  Sell
Trader_1    5/5/14  Buy
Trader_1    5/5/14  Sell
Trader_1    5/5/15  Buy
Trader_2    5/5/15  Sell
Trader_2    15/3/13 Buy
Trader_2    15/3/13 Sell
Trader_2    1/4/13  Buy
Trader_2    1/4/13  Sell
Trader_2    15/4/13 Buy
Trader_2    15/4/13 Sell
Trader_2    29/4/13 Buy
Trader_2    30/4/13 Sell
Trader_2    22/5/13 Sell

Example DataFrame below if anyone is willing to try.

data = {'trader_id': ['Trader_1','Trader_1','Trader_1','Trader_1','Trader_1','Trader_1','Trader_1',
      'Trader_1','Trader_1','Trader_2','Trader_2','Trader_2','Trader_2','Trader_2','Trader_2',
      'Trader_2','Trader_2','Trader_2','Trader_2'],
     'date': ['1/4/13', '13/4/11', '4/4/12', '3/4/13', '2/5/13', '9/7/13', '5/5/14', '5/5/14', '5/5/15',
      '5/5/15', '15/3/13', '15/3/13', '1/4/13', '1/4/13', '15/4/13', '15/4/13', '29/4/13', '30/4/13', '22/5/13'],
       'type': ['Sell', 'Sell', 'Sell', 'Sell', 'Sell', 'Sell', 'Buy', 'Sell', 'Buy', 'Sell', 'Buy', 
                'Sell', 'Buy', 'Sell', 'Buy', 'Sell', 'Buy', 'Sell', 'Sell']}

df = pd.DataFrame(data)

The (example) dataset has a total input of 19 rows. The output should only have 11 rows.

The output data should look like this:

Trader_ID   Date    Type
Trader_1    5/4/11  Sell
Trader_1    13/4/11 Sell
Trader_1    4/4/12  Sell
Trader_1    3/4/13  Sell
Trader_1    2/5/13  Sell
Trader_1    9/7/13  Sell
Trader_1    5/5/15  Buy
Trader_2    5/5/15  Sell
Trader_2    29/4/13 Buy
Trader_2    30/4/13 Sell
Trader_2    22/5/13 Sell

Would love some guidance on this one - I'm stuck. Thanks.

2 Answers 2

3

Your logic can be boiled down to keeping date for a trader if it contains only one type of trading, either Buy or Sell, you could use groupby.filter:

df.groupby(['date', 'trader_id']).filter(lambda g: (g.type == "Sell").all() or (g.type == "Buy").all())

enter image description here

Note the data used is what is provided in the dictionary which is slightly different from the data in the code block.

Sign up to request clarification or add additional context in comments.

Comments

1

It seems you never have a single trader with multiple sells or buys in a single day. So, we can filter on number of trades being equal to 1

df.groupby(['trader_id', 'date']).filter(lambda x: len(x) == 1)

       date trader_id  type
0    1/4/13  Trader_1  Sell
1   13/4/11  Trader_1  Sell
2    4/4/12  Trader_1  Sell
3    3/4/13  Trader_1  Sell
4    2/5/13  Trader_1  Sell
5    9/7/13  Trader_1  Sell
8    5/5/15  Trader_1   Buy
9    5/5/15  Trader_2  Sell
16  29/4/13  Trader_2   Buy
17  30/4/13  Trader_2  Sell
18  22/5/13  Trader_2  Sell

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.