0

I have two dataframes:

energy_calculated (the time_stamp columns were just formatted using 3 decimal values to make sure there weren't any hidden values disrupting the simple math):

    fl_key min_time_stamp   max_time_stamp      energy
0    10051 1614556800019.000 1614556807979.000   0.352
1    10051 1614556808019.000 1614556815979.000   0.275
2    10051 1614556816019.000 1614556823979.000   0.429
3    10051 1614556824019.000 1614556831979.000   0.406
4    10051 1614556832019.000 1614556839979.000   0.444
5    10051 1614556840019.000 1614556847979.000   0.348
6    10051 1614556848019.000 1614556855979.000   0.381
7    10051 1614556856019.000 1614556863979.000   0.456
8    10051 1614556864019.000 1614556871979.000   0.362
9    10051 1614556872019.000 1614556879979.000   0.465
10   10051 1614556880019.000 1614556887979.000   0.577
11   10051 1614556888019.000 1614556895979.000   0.305
12   10051 1614556896019.000 1614556903979.000   0.347
13   10051 1614556904019.000 1614556911979.000   0.246
14   10051 1614556912019.000 1614556919939.000   0.340

df_test:

      fl_Key  time_stamp        energy       install_prediction
1007   10051  1614556840299      -1                  -1
491    10051  1614556819659      -1                  -1
1944   10051  1614556877779      -1                  -1
2227   10051  1614556889099      -1                  -1
677    10051  1614556827099      -1                  -1
2944   10051  1614556917779      -1                  -1
799    10051  1614556831979      -1                  -1
2378   10051  1614556895139      -1                  -1
1877   10051  1614556875099      -1                  -1
487    10051  1614556819499      -1                  -1

I am trying to do a lookup on the fl_Key and time_stamp from the df_test dataframe using them to find the "energy" value from the energy_calculated dataframe. The fl_Key to fl_key column should be exact match. The time_stamp column should be in between the min and max time_stamp columns.

The fl_Key and fl_key names are different so I can track which column is coming from where.

I have a simple method (I put in the raise exceptions just to make sure it was always finding a match):

def integrateEnergyCalculationData(row, energy_calculations):
  energy_calculations = energy_calculations[(energy_calculations['fl_key'] == row.fl_Key) & (energy_calculations['min_time_stamp'] <= row.time_stamp) & (energy_calculations['max_time_stamp'] >= row.time_stamp)]

  if (len(energy_calculations) == 0):
    raise Exception("No energy data for: " + str(row.fl_Key) + ", " + str(row.time_stamp))
  elif (len(energy_calculations) >= 2):
    raise Exception("Too much energy data for: " + str(row.fl_Key) + ", " + str(row.time_stamp))

  return energy_calculations['energy']

I tie it all together using apply():

df_test['energy'] = df_test[['time_stamp','fl_Key']].apply(integrateEnergyCalculationData, 1, args=(energy_calculated, ))

What ends up happening is that the mapping is made for some of the rows, but not all of them:

My resulting df_test dataframe looks like (I have a much bigger version of df_test, but I have shortened it to 10 rows to demonstrate the issue). I randomly selected 10 rows from the bigger version - that is why the index numbers are out of whack:

       fl_Key    time_stamp            energy     install_prediction
1007    10051    1614556840299                          -1
491     10051    1614556819659    0.4291915384067029    -1
1944    10051    1614556877779                          -1
2227    10051    1614556889099                          -1
677     10051    1614556827099                          -1
2944    10051    1614556917779                          -1
799     10051    1614556831979                          -1
2378    10051    1614556895139                          -1
1877    10051    1614556875099                          -1
487     10051    1614556819499    0.4291915384067029    -1

What am I missing? Thanks.

2 Answers 2

1

It's weird. I got your two dataframes into one of my own, and ran your code. Reproduced your gaps. Then, I put pdb right before your return statement, and it was returning an object, not a float! In fact, all rows were objects. I put this line:

return float(energy_calculations['energy'])

And got your full dataframe.

    index   fl_Key  time_stamp  energy  install_prediction
0   1007    10051   1614556840299   0.348   -1
1   491     10051   1614556819659   0.429   -1
2   1944    10051   1614556877779   0.465   -1
3   2227    10051   1614556889099   0.305   -1
4   677     10051   1614556827099   0.406   -1
5   2944    10051   1614556917779   0.340   -1
6   799     10051   1614556831979   0.406   -1
7   2378    10051   1614556895139   0.305   -1
8   1877    10051   1614556875099   0.465   -1
9   487     10051   1614556819499   0.429   -1
Since I don't have access to your dataframes, maybe some type weirdness is going on that you should fix.

Scratch that. You can achieve the same using energy_calculations['energy'].values[0], without having to convert to float.

Sign up to request clarification or add additional context in comments.

Comments

0

Try using merge:

df_new = df_energy.rename(columns={'fl_key': 'fl_Key'})\
                  .merge(df_test[['fl_Key', 'time_stamp']], on='fl_Key', how='left')

print(df_new.loc[df_new['time_stamp']\
      .between(df_new['min_time_stamp'], df_new['max_time_stamp']), 'energy'])

Output:

    energy
21  0.429
29  0.429
34  0.406
36  0.406
50  0.348
92  0.465
98  0.465
113 0.305
117 0.305
145 0.34

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.