Using python pandas how can we change the data frame
input
id1 AAA 12 id1 BBB 2 id2 DDD 3 id2 AAA 23 id3 FFF 34 id3 AAA 5 id3 BBB 65output
id1 id2 id3 AAA 12 23 0 BBB 2 0 65 DDD 0 3 0 FFF 0 0 34
I think the pivot_table function is what you are looking for.
row = [["id1", "AAA", 12],["id2", "BBB", 2],["id3", "CCC", 1],["id1", "BBB", 4],["id2", "AAA", 1],["id3", "AAA", 3]]
df=pd.DataFrame(row, columns=["id", "letters", "numbers"])
df.pivot_table(values="numbers", index="letters",columns="id").reset_index()
It does what the pivot table in excel does, summing the values in case the index is duplicated (but you can set the aggregating function to be an average)
You can use unstack() and fillna() to get your expected output.
from pandas.compat import StringIO as pStringIO
new_data = pStringIO("""id Symbol Value
id1 AAA 12
id1 BBB 2
id2 DDD 3
id2 AAA 23
id3 FFF 34
id3 AAA 5
id3 BBB 65""")
df = pd.read_csv(new_data, sep="\s+", index_col=[0,1], skipinitialspace=True)
df_soln = (df.unstack(level=0)).fillna(0)
print(df_soln)
giving you
Value
id id1 id2 id3
Symbol
AAA 12.0 23.0 5.0
BBB 2.0 0.0 65.0
DDD 0.0 3.0 0.0
FFF 0.0 0.0 34.0
If you don't want the Value top-level showing, just do the following.
df_soln.columns = [c[-1] for c in df_soln.columns]