Python Pandas Dataframe - Nothing being returned from my function

Question

I have two dataframes:

energy_calculated (the time_stamp columns were just formatted using 3 decimal values to make sure there weren't any hidden values disrupting the simple math):

    fl_key min_time_stamp   max_time_stamp      energy
0    10051 1614556800019.000 1614556807979.000   0.352
1    10051 1614556808019.000 1614556815979.000   0.275
2    10051 1614556816019.000 1614556823979.000   0.429
3    10051 1614556824019.000 1614556831979.000   0.406
4    10051 1614556832019.000 1614556839979.000   0.444
5    10051 1614556840019.000 1614556847979.000   0.348
6    10051 1614556848019.000 1614556855979.000   0.381
7    10051 1614556856019.000 1614556863979.000   0.456
8    10051 1614556864019.000 1614556871979.000   0.362
9    10051 1614556872019.000 1614556879979.000   0.465
10   10051 1614556880019.000 1614556887979.000   0.577
11   10051 1614556888019.000 1614556895979.000   0.305
12   10051 1614556896019.000 1614556903979.000   0.347
13   10051 1614556904019.000 1614556911979.000   0.246
14   10051 1614556912019.000 1614556919939.000   0.340

df_test:

      fl_Key  time_stamp        energy       install_prediction
1007   10051  1614556840299      -1                  -1
491    10051  1614556819659      -1                  -1
1944   10051  1614556877779      -1                  -1
2227   10051  1614556889099      -1                  -1
677    10051  1614556827099      -1                  -1
2944   10051  1614556917779      -1                  -1
799    10051  1614556831979      -1                  -1
2378   10051  1614556895139      -1                  -1
1877   10051  1614556875099      -1                  -1
487    10051  1614556819499      -1                  -1

I am trying to do a lookup on the fl_Key and time_stamp from the df_test dataframe using them to find the "energy" value from the energy_calculated dataframe. The fl_Key to fl_key column should be exact match. The time_stamp column should be in between the min and max time_stamp columns.

The fl_Key and fl_key names are different so I can track which column is coming from where.

I have a simple method (I put in the raise exceptions just to make sure it was always finding a match):

def integrateEnergyCalculationData(row, energy_calculations):
  energy_calculations = energy_calculations[(energy_calculations['fl_key'] == row.fl_Key) & (energy_calculations['min_time_stamp'] <= row.time_stamp) & (energy_calculations['max_time_stamp'] >= row.time_stamp)]

  if (len(energy_calculations) == 0):
    raise Exception("No energy data for: " + str(row.fl_Key) + ", " + str(row.time_stamp))
  elif (len(energy_calculations) >= 2):
    raise Exception("Too much energy data for: " + str(row.fl_Key) + ", " + str(row.time_stamp))

  return energy_calculations['energy']

I tie it all together using apply():

df_test['energy'] = df_test[['time_stamp','fl_Key']].apply(integrateEnergyCalculationData, 1, args=(energy_calculated, ))

What ends up happening is that the mapping is made for some of the rows, but not all of them:

My resulting df_test dataframe looks like (I have a much bigger version of df_test, but I have shortened it to 10 rows to demonstrate the issue). I randomly selected 10 rows from the bigger version - that is why the index numbers are out of whack:

       fl_Key    time_stamp            energy     install_prediction
1007    10051    1614556840299                          -1
491     10051    1614556819659    0.4291915384067029    -1
1944    10051    1614556877779                          -1
2227    10051    1614556889099                          -1
677     10051    1614556827099                          -1
2944    10051    1614556917779                          -1
799     10051    1614556831979                          -1
2378    10051    1614556895139                          -1
1877    10051    1614556875099                          -1
487     10051    1614556819499    0.4291915384067029    -1

What am I missing? Thanks.

K.Cl · Accepted Answer · 2021-03-31 00:14:59Z

It's weird. I got your two dataframes into one of my own, and ran your code. Reproduced your gaps. Then, I put pdb right before your return statement, and it was returning an object, not a float! In fact, all rows were objects. I put this line:

return float(energy_calculations['energy'])

And got your full dataframe.

    index   fl_Key  time_stamp  energy  install_prediction
0   1007    10051   1614556840299   0.348   -1
1   491     10051   1614556819659   0.429   -1
2   1944    10051   1614556877779   0.465   -1
3   2227    10051   1614556889099   0.305   -1
4   677     10051   1614556827099   0.406   -1
5   2944    10051   1614556917779   0.340   -1
6   799     10051   1614556831979   0.406   -1
7   2378    10051   1614556895139   0.305   -1
8   1877    10051   1614556875099   0.465   -1
9   487     10051   1614556819499   0.429   -1

Since I don't have access to your dataframes, maybe some type weirdness is going on that you should fix.

Scratch that. You can achieve the same using energy_calculations['energy'].values[0], without having to convert to float.

ashkangh · Accepted Answer · 2021-03-31 00:04:09Z

0

Try using merge:

df_new = df_energy.rename(columns={'fl_key': 'fl_Key'})\
                  .merge(df_test[['fl_Key', 'time_stamp']], on='fl_Key', how='left')

print(df_new.loc[df_new['time_stamp']\
      .between(df_new['min_time_stamp'], df_new['max_time_stamp']), 'energy'])

Output:

answered Mar 31, 2021 at 0:04

ashkangh

1,6241 gold badge8 silver badges11 bronze badges

Collectives™ on Stack Overflow

Python Pandas Dataframe - Nothing being returned from my function

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related