My goal is to use this dataset
mngr shares value ticker
0 JP Morgan 50 12 AAPL
1 AQR 120 12 AAPL
2 JP Morgan 5 30 GOOGL
3 JP Morgan 6 25 FB
4 AQR 10 30 GOOGL
5 AQR 12 25 FB
6 AQR 30 14 PG
to create another dataset, where the values are taken from the column "shares":
AAPL GOOGL FB PG
JP Morgan 50 5 6 NaN
AQR 120 10 12 30
So far I have an almost complete code
import pandas as pd
import networkx as nx
import numpy as np
df = pd.DataFrame({'mngr': ['JP Morgan', 'AQR', 'JP Morgan', 'JP Morgan', 'AQR', 'AQR', 'AQR'], 'shares': [50, 120, 5, 6, 10, 12, 30],
'value': [12, 12, 30, 25, 30, 25, 14], 'ticker': ['AAPL', 'AAPL', 'GOOGL', 'FB', 'GOOGL', 'FB', 'PG']})
mngrlist = []
tickerlist = []
shareslist = []
for item in df.mngr.unique():
mngrlist.append(item)
for item in df.ticker.unique():
tickerlist.append(item)
for item in df.shares.unique():
shareslist.append(item)
print df
r = np.zeros((len(mngrlist), len(tickerlist)))*np.nan
df1 = pd.DataFrame(columns=tickerlist, data=r)
df1.index = mngrlist
for s in tickerlist:
for t in mngrlist:
tick = df['ticker'] == s
mn = df["mngr"] == t
df1[s][t] = df.loc[tick & mn, "shares"].values
print df1
but the only problem is the last step with this line
df1[s][t] = df.loc[tick & mn, "shares"].values
As I understood, these two objects have different dimensions(natures), although if you print out every
df.loc[tick & mn, "shares"].values
it has only one element each, and I don't know how to convert it into a simple float value. I also tried to use groupby, but didn't succeed.
Another question is whether it is possible to write a more efficient code for this procedure. I will need to run it for a large dataset, so efficiency matters.