1

My goal is to use this dataset

         mngr  shares  value ticker
0  JP Morgan      50     12   AAPL
1        AQR     120     12   AAPL
2  JP Morgan       5     30  GOOGL
3  JP Morgan       6     25     FB
4        AQR      10     30  GOOGL
5        AQR      12     25     FB
6        AQR      30     14     PG

to create another dataset, where the values are taken from the column "shares":

           AAPL  GOOGL   FB   PG
JP Morgan    50      5    6   NaN
AQR         120     10    12  30

So far I have an almost complete code

import pandas as pd
import networkx as nx
import numpy as np
df = pd.DataFrame({'mngr': ['JP Morgan', 'AQR', 'JP Morgan', 'JP Morgan', 'AQR', 'AQR', 'AQR'], 'shares': [50, 120, 5, 6, 10, 12, 30],
'value': [12, 12, 30, 25, 30, 25, 14], 'ticker': ['AAPL', 'AAPL', 'GOOGL', 'FB', 'GOOGL', 'FB', 'PG']})
mngrlist = []
tickerlist = []
shareslist = []
for item in df.mngr.unique():
    mngrlist.append(item)
for item in df.ticker.unique():
    tickerlist.append(item)
for item in df.shares.unique():
    shareslist.append(item)
print df
r = np.zeros((len(mngrlist), len(tickerlist)))*np.nan
df1 = pd.DataFrame(columns=tickerlist, data=r)
df1.index = mngrlist
for s in tickerlist:
    for t in mngrlist:
        tick = df['ticker'] == s
        mn = df["mngr"] == t
        df1[s][t] = df.loc[tick & mn, "shares"].values
print df1

but the only problem is the last step with this line

df1[s][t] = df.loc[tick & mn, "shares"].values

As I understood, these two objects have different dimensions(natures), although if you print out every

df.loc[tick & mn, "shares"].values

it has only one element each, and I don't know how to convert it into a simple float value. I also tried to use groupby, but didn't succeed.

Another question is whether it is possible to write a more efficient code for this procedure. I will need to run it for a large dataset, so efficiency matters.

0

2 Answers 2

5

If i understand correctly, you want to "pivot" the original DF:

In [305]: df.pivot(index='mngr', columns='ticker', values='shares')
Out[305]:
ticker      AAPL    FB  GOOGL    PG
mngr
AQR        120.0  12.0   10.0  30.0
JP Morgan   50.0   6.0    5.0   NaN

optionally, we can remove axis names:

In [307]: df.pivot(index='mngr', columns='ticker', values='shares') \
            .rename_axis(None) \
            .rename_axis(None,1)
Out[307]:
            AAPL    FB  GOOGL    PG
AQR        120.0  12.0   10.0  30.0
JP Morgan   50.0   6.0    5.0   NaN
Sign up to request clarification or add additional context in comments.

7 Comments

Why am I so late to this question?
@Bharath, there is still one or two options left ... ;-)
Yeah we can use crosstab but pivot is the correct way of doing it.
Thanks a lot, guys! All the comments are superhelpful
@AnnaIgnashkina, glad it helps :)
|
4

Another way other than @MaxU nearly perfect solution is to use set_index and unstack:

df.set_index(['mngr','ticker']).unstack(1)['shares']

Output:

ticker      AAPL    FB  GOOGL    PG
mngr                               
AQR        120.0  12.0   10.0  30.0
JP Morgan   50.0   6.0    5.0   NaN

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.