assuming I have the following pandas dataframe, where the n columns have a name from u0 to u(n-1) (in this case is n=3).
import pandas as pd
df = pd.DataFrame(np.random.randn(5,3), columns=["u0","u1","u2"])
print(df)
u0 u1 u2
0 -0.254454 -0.227589 -0.208454
1 -0.071567 -2.878662 -0.094863
2 -0.100024 -2.295788 -0.103415
3 0.091116 -0.143777 0.874170
4 -1.398530 -1.248449 -0.707336
Now I want to calculate n new columns with name pn where in each cell is the value divided by the sum of the row. Example for cell(0,0) is p(0,0) = u(0,0) / (u(0,0) + u(0,1) + u(0,2))
At the moment I'm doing this by applying a function p to each row. The return value is a new dataframe, where I rename the columns and finally merge both dataframe.
def p(row):
u = row.loc["u0":"u2"]
return u / u.sum()
df2 = df.apply(p, axis=1)
df2.columns = ["p0","p1","p2"]
df = pd.concat([df, df2], axis=1)
print(df)
u0 u1 u2 p0 p1 p2
0 -0.254454 -0.227589 -0.208454 0.36850848 0.329601722 0.301889798
1 -0.071567 -2.878662 -0.094863 0.02350241 0.945344837 0.031152753
...
I'm not sure if this is the pythonic way and if it's fast enough. Later I will have many thousands of rows and about 100 columns (but this value is not fixed as shown in this example code).
Thank you very much for any ideas, comments or suggestions?