Fill a Pandas dataframe using information from another Pandas dataframe

Question

I have one Pandas dataframe that contains information thus:

index       year  month day symbol transaction  nr_shares
2011-01-10  2011  1     10  AAPL       Buy       1500
2011-01-13  2011  1     13  GOOG       Sell      1000

and I would like to fill a second, zero-filled Pandas dataframe

index        AAPL  GOOG
2011-01-10     0     0
2011-01-11     0     0
2011-01-12     0     0
2011-01-13     0     0

using the information from the first dataframe so I get

index        AAPL  GOOG
2011-01-10   1500    0
2011-01-11     0     0
2011-01-12     0     0
2011-01-13     0  -1000

where it can be seen that on the relevant dates the buy and sell transactions for a specified number of shares have been entered in the appropriate column, with a positive number for a buy and a negative number for a sell order.

How can I accomplish this? Will I have to loop over the first dataframe index and check the symbol and transaction columns using nested "if" statements and then write to the second dataframe, or is there a more elegant dataframe method that I could use?

DSM · Accepted Answer · 2013-03-29 23:20:57Z

4

You could use pivot_table. Starting from (edited to be slightly more complicated):

>>> df1
        index  year  month  day symbol transaction  nr_shares
0  2011-01-10  2011      1   10   AAPL         Buy       1500
1  2011-01-10  2011      1   10   AAPL        Sell        200
2  2011-01-10  2011      1   10   GOOG        Sell        500
3  2011-01-10  2011      1   10   GOOG         Buy        600
4  2011-01-13  2011      1   13   GOOG        Sell       1000
>>> df2
        index  AAPL  GOOG
0  2011-01-10     0     0
1  2011-01-11     0     0
2  2011-01-12     0     0
3  2011-01-13     0     0

We can sign the shares:

>>> df1["nr_shares"] = df1.apply(lambda row: row["nr_shares"] * (-1 if row["transaction"] == "Sell" else 1), axis=1)
>>> df1
        index  year  month  day symbol transaction  nr_shares
0  2011-01-10  2011      1   10   AAPL         Buy       1500
1  2011-01-10  2011      1   10   AAPL        Sell       -200
2  2011-01-10  2011      1   10   GOOG        Sell       -500
3  2011-01-10  2011      1   10   GOOG         Buy        600
4  2011-01-13  2011      1   13   GOOG        Sell      -1000

And then you can pivot df1. By default it uses the mean of the aggregated values, but we want the sum:

>>> a = df1.pivot_table(values="nr_shares", rows="index", cols="symbol",
                    aggfunc=sum)
>>> a
symbol      AAPL  GOOG
index                 
2011-01-10  1300   100
2011-01-13   NaN -1000

Give b the same index:

>>> b = df2.set_index("index")
>>> b
            AAPL  GOOG
index                 
2011-01-10     0     0
2011-01-11     0     0
2011-01-12     0     0
2011-01-13     0     0

And then add them:

>>> (a+b).fillna(0)
symbol      AAPL  GOOG
index                 
2011-01-10  1300   100
2011-01-11     0     0
2011-01-12     0     0
2011-01-13     0 -1000

edited Mar 29, 2013 at 23:20

answered Mar 29, 2013 at 18:35

DSM

355k67 gold badges606 silver badges504 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

babelproofreader Over a year ago

Great answer, but slight problem with the "signing" of share amounts. If there is more than one order on the same date, the share amount and sign for the last entry on this date is written to all orders on this date.

DSM Over a year ago

@babelproofreader: I don't see a problem with signing the shares, but there was an issue pivoting -- it was taking the mean, not the sum. Does it work now for your case?

babelproofreader Over a year ago

For me it was definitely a problem with the signing - I was printing out the dfs after each step. I think it was a problem with using a datetime object, which I created from other columns shown, as the row index. Doing the signing step before creating this datetime index, i.e. while the index is still integer values 0,1,2..., and then creating the new datetime index solves this problem. The pivot_table with argfunc=sum works fine on this resulting df.

DSM Over a year ago

@babelproofreader: ah, okay. If you did something that I didn't, it's not surprising I didn't see the same outcome. :^)

Andy Hayden · Accepted Answer · 2013-03-29 18:47:57Z

3

First using apply you could add a column with the signed shares (positive for Buy negative for Sell):

In [11]: df['signed_shares'] = df.apply(lambda row: row['nr_shares']
                                                    if row['transaction'] == 'Buy'
                                                    else -row['nr_shares'],
                                        axis=1)

In [12]: df
Out[12]: 
            year  month  day symbol transaction  nr_shares  signed_shares
index                                                                    
2011-01-10  2011      1   10   AAPL         Buy       1500           1500
2011-01-13  2011      1   13   GOOG        Sell       1000          -1000

Use just those columns of interest to you and unstack them:

In [13]: df[['symbol', 'signed_shares']].set_index('symbol', append=True)
Out[13]: 
                   signed_shares
index      symbol               
2011-01-10 AAPL             1500
2011-01-13 GOOG            -1000

In [14]: a = df[['symbol', 'signed_shares']].set_index('symbol', append=True).unstack()

In [15]: a
Out[15]: 
            signed_shares      
symbol               AAPL  GOOG
index                          
2011-01-10           1500   NaN
2011-01-13            NaN -1000

Reindex over whatever date range you like:

In [16]: rng = pd.date_range('2011-01-10', periods=4)

In [17]: a.reindex(rng)
Out[17]: 
            signed_shares      
symbol               AAPL  GOOG
2011-01-10           1500   NaN
2011-01-11            NaN   NaN
2011-01-12            NaN   NaN
2011-01-13            NaN -1000

Finally fill in the NaNs with 0 using fillna:

In [18]: a.reindex(rng).fillna(0)
Out[18]: 
            signed_shares      
symbol               AAPL  GOOG
2011-01-10           1500     0
2011-01-11              0     0
2011-01-12              0     0
2011-01-13              0 -1000

As @DSM points out, you can do [13]-[15] much nicer using pivot_table:

In [20]: df.reset_index().pivot_table('signed_shares', 'index', 'symbol')
Out[20]: 
symbol      AAPL  GOOG
index                 
2011-01-10  1500   NaN
2011-01-13   NaN -1000

edited Mar 29, 2013 at 18:47

answered Mar 29, 2013 at 18:36

Andy Hayden

378k110 gold badges640 silver badges546 bronze badges

6 Comments

DSM Over a year ago

Heh. We solved different problems in different ways, but wound up in the same place. :^)

Andy Hayden Over a year ago

:) I was just about to comment that pivot_table is much nicer before you deleted your answer!!

DSM Over a year ago

I wouldn't have had to if you hadn't shown me I forgot to sign the values.. can you see deleted answers yet or is that at 10k?

Andy Hayden Over a year ago

@DSM I upvoted and then it wouldn't let me comment... on refresh it was gone, nearly got the 10k power...

babelproofreader Over a year ago

@AndyHayden Great answer, but slight problem with the "signing" of share amounts. If there is more than one order on the same date, the share amount and sign for the last entry on this date is written to all orders on this date.

|

Collectives™ on Stack Overflow

Fill a Pandas dataframe using information from another Pandas dataframe

2 Answers 2

4 Comments

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related