3

I have one Pandas dataframe that contains information thus:

index       year  month day symbol transaction  nr_shares
2011-01-10  2011  1     10  AAPL       Buy       1500
2011-01-13  2011  1     13  GOOG       Sell      1000

and I would like to fill a second, zero-filled Pandas dataframe

index        AAPL  GOOG
2011-01-10     0     0
2011-01-11     0     0
2011-01-12     0     0
2011-01-13     0     0

using the information from the first dataframe so I get

index        AAPL  GOOG
2011-01-10   1500    0
2011-01-11     0     0
2011-01-12     0     0
2011-01-13     0  -1000

where it can be seen that on the relevant dates the buy and sell transactions for a specified number of shares have been entered in the appropriate column, with a positive number for a buy and a negative number for a sell order.

How can I accomplish this? Will I have to loop over the first dataframe index and check the symbol and transaction columns using nested "if" statements and then write to the second dataframe, or is there a more elegant dataframe method that I could use?

2 Answers 2

4

You could use pivot_table. Starting from (edited to be slightly more complicated):

>>> df1
        index  year  month  day symbol transaction  nr_shares
0  2011-01-10  2011      1   10   AAPL         Buy       1500
1  2011-01-10  2011      1   10   AAPL        Sell        200
2  2011-01-10  2011      1   10   GOOG        Sell        500
3  2011-01-10  2011      1   10   GOOG         Buy        600
4  2011-01-13  2011      1   13   GOOG        Sell       1000
>>> df2
        index  AAPL  GOOG
0  2011-01-10     0     0
1  2011-01-11     0     0
2  2011-01-12     0     0
3  2011-01-13     0     0

We can sign the shares:

>>> df1["nr_shares"] = df1.apply(lambda row: row["nr_shares"] * (-1 if row["transaction"] == "Sell" else 1), axis=1)
>>> df1
        index  year  month  day symbol transaction  nr_shares
0  2011-01-10  2011      1   10   AAPL         Buy       1500
1  2011-01-10  2011      1   10   AAPL        Sell       -200
2  2011-01-10  2011      1   10   GOOG        Sell       -500
3  2011-01-10  2011      1   10   GOOG         Buy        600
4  2011-01-13  2011      1   13   GOOG        Sell      -1000

And then you can pivot df1. By default it uses the mean of the aggregated values, but we want the sum:

>>> a = df1.pivot_table(values="nr_shares", rows="index", cols="symbol",
                    aggfunc=sum)
>>> a
symbol      AAPL  GOOG
index                 
2011-01-10  1300   100
2011-01-13   NaN -1000

Give b the same index:

>>> b = df2.set_index("index")
>>> b
            AAPL  GOOG
index                 
2011-01-10     0     0
2011-01-11     0     0
2011-01-12     0     0
2011-01-13     0     0

And then add them:

>>> (a+b).fillna(0)
symbol      AAPL  GOOG
index                 
2011-01-10  1300   100
2011-01-11     0     0
2011-01-12     0     0
2011-01-13     0 -1000
Sign up to request clarification or add additional context in comments.

4 Comments

Great answer, but slight problem with the "signing" of share amounts. If there is more than one order on the same date, the share amount and sign for the last entry on this date is written to all orders on this date.
@babelproofreader: I don't see a problem with signing the shares, but there was an issue pivoting -- it was taking the mean, not the sum. Does it work now for your case?
For me it was definitely a problem with the signing - I was printing out the dfs after each step. I think it was a problem with using a datetime object, which I created from other columns shown, as the row index. Doing the signing step before creating this datetime index, i.e. while the index is still integer values 0,1,2..., and then creating the new datetime index solves this problem. The pivot_table with argfunc=sum works fine on this resulting df.
@babelproofreader: ah, okay. If you did something that I didn't, it's not surprising I didn't see the same outcome. :^)
3

First using apply you could add a column with the signed shares (positive for Buy negative for Sell):

In [11]: df['signed_shares'] = df.apply(lambda row: row['nr_shares']
                                                    if row['transaction'] == 'Buy'
                                                    else -row['nr_shares'],
                                        axis=1)

In [12]: df
Out[12]: 
            year  month  day symbol transaction  nr_shares  signed_shares
index                                                                    
2011-01-10  2011      1   10   AAPL         Buy       1500           1500
2011-01-13  2011      1   13   GOOG        Sell       1000          -1000

Use just those columns of interest to you and unstack them:

In [13]: df[['symbol', 'signed_shares']].set_index('symbol', append=True)
Out[13]: 
                   signed_shares
index      symbol               
2011-01-10 AAPL             1500
2011-01-13 GOOG            -1000

In [14]: a = df[['symbol', 'signed_shares']].set_index('symbol', append=True).unstack()

In [15]: a
Out[15]: 
            signed_shares      
symbol               AAPL  GOOG
index                          
2011-01-10           1500   NaN
2011-01-13            NaN -1000

Reindex over whatever date range you like:

In [16]: rng = pd.date_range('2011-01-10', periods=4)

In [17]: a.reindex(rng)
Out[17]: 
            signed_shares      
symbol               AAPL  GOOG
2011-01-10           1500   NaN
2011-01-11            NaN   NaN
2011-01-12            NaN   NaN
2011-01-13            NaN -1000

Finally fill in the NaNs with 0 using fillna:

In [18]: a.reindex(rng).fillna(0)
Out[18]: 
            signed_shares      
symbol               AAPL  GOOG
2011-01-10           1500     0
2011-01-11              0     0
2011-01-12              0     0
2011-01-13              0 -1000

As @DSM points out, you can do [13]-[15] much nicer using pivot_table:

In [20]: df.reset_index().pivot_table('signed_shares', 'index', 'symbol')
Out[20]: 
symbol      AAPL  GOOG
index                 
2011-01-10  1500   NaN
2011-01-13   NaN -1000

6 Comments

Heh. We solved different problems in different ways, but wound up in the same place. :^)
:) I was just about to comment that pivot_table is much nicer before you deleted your answer!!
I wouldn't have had to if you hadn't shown me I forgot to sign the values.. can you see deleted answers yet or is that at 10k?
@DSM I upvoted and then it wouldn't let me comment... on refresh it was gone, nearly got the 10k power...
@AndyHayden Great answer, but slight problem with the "signing" of share amounts. If there is more than one order on the same date, the share amount and sign for the last entry on this date is written to all orders on this date.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.