How create Pandas DataFrame object using data from another DataFrame object?

Question

My goal is to use this dataset

         mngr  shares  value ticker
0  JP Morgan      50     12   AAPL
1        AQR     120     12   AAPL
2  JP Morgan       5     30  GOOGL
3  JP Morgan       6     25     FB
4        AQR      10     30  GOOGL
5        AQR      12     25     FB
6        AQR      30     14     PG

to create another dataset, where the values are taken from the column "shares":

           AAPL  GOOGL   FB   PG
JP Morgan    50      5    6   NaN
AQR         120     10    12  30

So far I have an almost complete code

import pandas as pd
import networkx as nx
import numpy as np
df = pd.DataFrame({'mngr': ['JP Morgan', 'AQR', 'JP Morgan', 'JP Morgan', 'AQR', 'AQR', 'AQR'], 'shares': [50, 120, 5, 6, 10, 12, 30],
'value': [12, 12, 30, 25, 30, 25, 14], 'ticker': ['AAPL', 'AAPL', 'GOOGL', 'FB', 'GOOGL', 'FB', 'PG']})
mngrlist = []
tickerlist = []
shareslist = []
for item in df.mngr.unique():
    mngrlist.append(item)
for item in df.ticker.unique():
    tickerlist.append(item)
for item in df.shares.unique():
    shareslist.append(item)
print df
r = np.zeros((len(mngrlist), len(tickerlist)))*np.nan
df1 = pd.DataFrame(columns=tickerlist, data=r)
df1.index = mngrlist
for s in tickerlist:
    for t in mngrlist:
        tick = df['ticker'] == s
        mn = df["mngr"] == t
        df1[s][t] = df.loc[tick & mn, "shares"].values
print df1

but the only problem is the last step with this line

df1[s][t] = df.loc[tick & mn, "shares"].values

As I understood, these two objects have different dimensions(natures), although if you print out every

df.loc[tick & mn, "shares"].values

it has only one element each, and I don't know how to convert it into a simple float value. I also tried to use groupby, but didn't succeed.

Another question is whether it is possible to write a more efficient code for this procedure. I will need to run it for a large dataset, so efficiency matters.

MaxU - stand with Ukraine · Accepted Answer · 2017-11-06 13:35:36Z

5

If i understand correctly, you want to "pivot" the original DF:

In [305]: df.pivot(index='mngr', columns='ticker', values='shares')
Out[305]:
ticker      AAPL    FB  GOOGL    PG
mngr
AQR        120.0  12.0   10.0  30.0
JP Morgan   50.0   6.0    5.0   NaN

optionally, we can remove axis names:

In [307]: df.pivot(index='mngr', columns='ticker', values='shares') \
            .rename_axis(None) \
            .rename_axis(None,1)
Out[307]:
            AAPL    FB  GOOGL    PG
AQR        120.0  12.0   10.0  30.0
JP Morgan   50.0   6.0    5.0   NaN

answered Nov 6, 2017 at 13:35

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Bharath M Shetty Over a year ago

Why am I so late to this question?

MaxU - stand with Ukraine Over a year ago

@Bharath, there is still one or two options left ... ;-)

Bharath M Shetty Over a year ago

Yeah we can use crosstab but pivot is the correct way of doing it.

Anna Ignashkina Over a year ago

Thanks a lot, guys! All the comments are superhelpful

MaxU - stand with Ukraine Over a year ago

@AnnaIgnashkina, glad it helps :)

|

Scott Boston · Accepted Answer · 2017-11-06 13:42:34Z

4

Another way other than @MaxU nearly perfect solution is to use set_index and unstack:

df.set_index(['mngr','ticker']).unstack(1)['shares']

Output:

ticker      AAPL    FB  GOOGL    PG
mngr                               
AQR        120.0  12.0   10.0  30.0
JP Morgan   50.0   6.0    5.0   NaN

answered Nov 6, 2017 at 13:42

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Collectives™ on Stack Overflow

How create Pandas DataFrame object using data from another DataFrame object?

2 Answers 2

7 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related