Iterate over rows and apply function based on conditions in existing dataframe columns

Question

I have the following script:

df = pd.DataFrame()
df["Stake"]=[0.25,0.15,0.26,0.30,0.10,0.40,0.32,0.11,0.20,0.25]
df["Odds"]=[2.5,4.0,1.75,2.2,1.85,3.2,1.5,1.2,2.15,1.65]
df["Ftr"]=["H","D","A","H","H","A","D","H","H","A"]
df["Ind"]=[1,2,2,1,3,3,3,1,2,2]

which results in:

    Stake   Odds    Ftr Ind
0   0.25    2.50    H   1
1   0.15    4.00    D   2
2   0.26    1.75    A   2
3   0.30    2.20    H   1
4   0.10    1.85    H   3
5   0.40    3.20    A   3
6   0.32    1.50    D   3
7   0.11    1.20    H   1
8   0.20    2.15    H   2
9   0.25    1.65    A   2

I want to create two additional columns "Start Balance" and "End Balance"."Start Balance" in index 0 is equal to 1000. "End balance" is always equal to either:

"Start Balance" - "Stake" * "Start Balance" + "Stake" x "Start Balance" x "Odds" if column "Ftr" = "H".

or,

"Start Balance" - "Stake" * "Start Balance" if column "Ftr" different than "H".

Then the next index "Start balance" becomes the preceding index "End Balance". For example "End Balance" in index 0 becomes "Start Balance" in index 1.

To make things a bit more complicated the "Start Balance" should respect one more condition. If "Ind" column is different than 1 , for example 2 then the "Start Balance" for both rows (indices 1 and 2) is equal to the "End Balance" in index 0. Likewise where "Ind" is 3 then all indices (4,5,6) should have "Start Balance" equal to the "End balance" in index 3. Expected result is:

    Stake   Odds    Ftr Ind Start Balance   End Balance
0   0.25    2.5      H   1     1000.0          1375.0
1   0.15     4       D   2     1375.0          1168.8
2   0.26    1.75     A   2     1375.0          1017.5
3   0.3     2.2      H   1     1017.5          1383.8
4   0.1     1.85     H   3     1383.8          1501.4
5   0.4     3.2      A   3     1383.8           830.3
6   0.32    1.5      D   3     1383.8           941.0
7   0.11    1.2      H   1      941.0           961.7
8   0.2     2.15     H   2      961.7          1182.9
9   0.25    1.65     A   2      961.7           721.3

I have not tried anything since I truly don't know how to approach so many conditions :). Cheers

Code Different · Accepted Answer · 2020-02-17 20:52:34Z

1

I can't think of a vectorized function to do what you want so a for loop is the only solution I can think of:

# A temp dataframe to keep track of the End Balance by Ind
# It's empty to start
tmp = pd.DataFrame(columns=['index', 'End Balance']).rename_axis('ind')

for index, row in df.iterrows():
    stake, odds, ind = row['Stake'], row['Odds'], row['Ind']

    if index == 0:
        start_balance = 1000
    elif row['Ind'] == 1:
        start_balance = df.loc[index - 1, 'End Balance']
    else:
        start_balance = tmp.query('ind != @ind').sort_values('index')['End Balance'].iloc[-1]

    end_balance = start_balance * (1 - stake + stake * odds) if row['Ftr'] == 'H' else start_balance * (1 - stake)

    # Keep track of when the current Ind last occurs
    tmp.loc[ind, ['index', 'End Balance']] = [index, end_balance]

    df.loc[index, 'Start Balance'] = start_balance
    df.loc[index, 'End Balance'] = end_balance

Result:

   Stake  Odds Ftr  Ind  Start Balance  End Balance
0   0.25  2.50   H    1    1000.000000  1375.000000
1   0.15  4.00   D    2    1375.000000  1168.750000
2   0.26  1.75   A    2    1375.000000  1017.500000
3   0.30  2.20   H    1    1017.500000  1383.800000
4   0.10  1.85   H    3    1383.800000  1501.423000
5   0.40  3.20   A    3    1383.800000   830.280000
6   0.32  1.50   D    3    1383.800000   940.984000
7   0.11  1.20   H    1     940.984000   961.685648
8   0.20  2.15   H    2     961.685648  1182.873347
9   0.25  1.65   A    2     961.685648   721.264236

edited Feb 17, 2020 at 20:52

answered Feb 17, 2020 at 20:18

Code Different

93.4k16 gold badges154 silver badges175 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Martin Yordanov Georgiev Over a year ago

In fact this is not a mistake but one of the conditions that makes me struggle over this. If "Ind" column is different than 1 , for example 2 then the "Start Balance" for both rows where 2 appears should be equal to the end balance in the row before the 2s. In the example indices 1 and 2 should have as "Start Balance" the "End balance from Index 0. Likewise where "Ind" column is 3 then all rows where are 3s should have "Start Balance" equal to the "End balance" in the row preceding the 3s (from the table - indices 4,5,6 should have 'Start Balance" equal to "End Balance' in index 3).

Martin Yordanov Georgiev Over a year ago

The solution provided by you is very close but not yet :) , thanks anyways

Code Different Over a year ago

Let me see if I understand your question correctly: if Ind == 1, Start Balance = End Balance of previous row. If Ind == 2, Start Balance = End Balance of the last row with Ind != 2. If Ind == 3, Start Balance = End Balance of the last row with Ind != 3? Is that what you meant?

Martin Yordanov Georgiev Over a year ago

Exactly this is what I mean, yes

Code Different Over a year ago

It only requires some small modifications in the loop. See my edited answer

|

Collectives™ on Stack Overflow

Iterate over rows and apply function based on conditions in existing dataframe columns

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related