0

I have a dataframe with a datetime index, which looks like this:

                         ModelRun  Tmp_2m_C     DSWRF   TCDC  Obs_kW  n  beta  \
2016-01-01 06:30:00  2.016010e+09  7.962387   0.00000  100.0     0.0  1   0.0   
2016-01-01 07:30:00  2.016010e+09  8.077713   9.00000  100.0     0.0  1   0.0   
2016-01-01 08:30:00  2.016010e+09  8.467117  46.32202  100.0    12.0  1   0.0   
                         delta                   dtm_utc  \
2016-01-01 06:30:00 -23.058629 2016-01-01 06:30:00+00:00   
2016-01-01 07:30:00 -23.058629 2016-01-01 07:30:00+00:00   
2016-01-01 08:30:00 -23.058629 2016-01-01 08:30:00+00:00   
                                    dtm_local         ...           \
2016-01-01 06:30:00 2016-01-01 07:30:00+01:00         ...            
2016-01-01 07:30:00 2016-01-01 08:30:00+01:00         ...            
2016-01-01 08:30:00 2016-01-01 09:30:00+01:00         ...            
                                   corr1_dtm                          dtm_sun  \
2016-01-01 06:30:00 -1 days +23:45:13.666667 2016-01-01 07:12:19.401323+01:00   
2016-01-01 07:30:00 -1 days +23:45:13.666667 2016-01-01 08:12:19.401323+01:00   
2016-01-01 08:30:00 -1 days +23:45:13.666667 2016-01-01 09:12:19.401323+01:00   
                     sun_hour sun_hour_angle delta_rad  sun_hour_angle_rad  \
2016-01-01 06:30:00       7.2          -72.0 -0.402449           -1.256637   
2016-01-01 07:30:00       8.2          -57.0 -0.402449           -0.994838   
2016-01-01 08:30:00       9.2          -42.0 -0.402449           -0.733038   
                     earth_sunset_deg  earth_sunrise_deg  surface_sunset_deg  \
2016-01-01 06:30:00         68.645391         -68.645391           70.481456   
2016-01-01 07:30:00         68.645391         -68.645391           70.481456   
2016-01-01 08:30:00         68.645391         -68.645391           70.481456   
                     surface_sunrise_deg  
2016-01-01 06:30:00           -79.585047  
2016-01-01 07:30:00           -79.585047  
2016-01-01 08:30:00           -79.585047 

Please notice that I have put all the dataframe columns so that you can attempt to trace back the error, but in what I am trying to do I am only interested in the last four columns, so in this part of the dataframe:

                     earth_sunset_deg  earth_sunrise_deg  surface_sunset_deg  \
2016-01-01 06:30:00         68.645391         -68.645391           70.481456   
2016-01-01 07:30:00         68.645391         -68.645391           70.481456   
2016-01-01 08:30:00         68.645391         -68.645391           70.481456   
                     surface_sunrise_deg  
2016-01-01 06:30:00           -79.585047  
2016-01-01 07:30:00           -79.585047  
2016-01-01 08:30:00           -79.585047 

This is only part of the dataframe, as it contains 2 years of data. What I am trying to do is the following:

if surface_sunset_deg > earth_sunset_deg:
    sunset_deg = earth_sunset_deg
else:
    sunset_deg = surface_sunset_deg

So essentially, I am trying to iterate through all rows of the dataframe (which correspond to different timestamps), evaluate which of the 2 angles is greater (surface_sunset_deg or earth_sunset_deg) and store the one that satisfies my criterion in a new column df["sunset_deg"].

As far as I know, the most efficient way of looping over a dataframe is using the apply function, therefore what I have written is this:

df["sunset_deg"] = df.apply(lambda row: row["earth_sunset_deg"] if row["earth_sunset_deg"] < row["surface_sunset_deg"] else row["surface_sunset_earth"], axis=1)

And the error I get is this:

Traceback (most recent call last):
  File "C:\Users\Admin\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2483, in get_value
    return libts.get_value_box(s, key)
  File "pandas/_libs/tslib.pyx", line 923, in pandas._libs.tslib.get_value_box (pandas\_libs\tslib.c:18843)
  File "pandas/_libs/tslib.pyx", line 932, in pandas._libs.tslib.get_value_box (pandas\_libs\tslib.c:18477)
TypeError: 'str' object cannot be interpreted as an integer
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "C:\Users\Admin\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2910, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-11-69be989aa737>", line 1, in <module>
    df.apply(lambda row: row["earth_sunset_deg"] if row["earth_sunset_deg"] < row["surface_sunset_deg"] else row["surface_sunset_earth"], axis=1)
  File "C:\Users\Admin\Anaconda3\lib\site-packages\pandas\core\frame.py", line 4262, in apply
    ignore_failures=ignore_failures)
  File "C:\Users\Admin\Anaconda3\lib\site-packages\pandas\core\frame.py", line 4358, in _apply_standard
    results[i] = func(v)
  File "<ipython-input-11-69be989aa737>", line 1, in <lambda>
    df.apply(lambda row: row["earth_sunset_deg"] if row["earth_sunset_deg"] < row["surface_sunset_deg"] else row["surface_sunset_earth"], axis=1)
  File "C:\Users\Admin\Anaconda3\lib\site-packages\pandas\core\series.py", line 601, in __getitem__
    result = self.index.get_value(self, key)
  File "C:\Users\Admin\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2491, in get_value
    raise e1
  File "C:\Users\Admin\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2477, in get_value
    tz=getattr(series.dtype, 'tz', None))
  File "pandas\_libs\index.pyx", line 98, in pandas._libs.index.IndexEngine.get_value
  File "pandas\_libs\index.pyx", line 106, in pandas._libs.index.IndexEngine.get_value
  File "pandas\_libs\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 1210, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1218, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: ('surface_sunset_earth', 'occurred at index 2016-02-02 00:30:00')

When I run the same line of code for the first 30 elemtns of the dataframe, so:

 df["sunset_deg"] = df[:30].apply(lambda row: row["earth_sunset_deg"] if row["earth_sunset_deg"] < row["surface_sunset_deg"] else row["surface_sunset_earth"], axis=1)

It is running smooth and produces the result I want. Can you please help me trace back the error? I am relatively new to Python and I have already done my best here with no success. Thank you in advance.

3
  • 2
    So it seems you have no column surface_sunset_earth ? Commented Apr 5, 2018 at 15:02
  • 2
    No need to use apply here. You can use boolean masks. df.loc[:, df["earth_sunset_deg"] < df["surface_sunset_deg"]] = df["earth_sunset_deg"] and so on. Commented Apr 5, 2018 at 15:04
  • Wow. Thank you so much for your help. I really overlooked it even though it was so obvious. thanks alot! Commented Apr 5, 2018 at 15:08

2 Answers 2

2

Using apply() for this is not efficient at all. You should almost never use apply() except as a last resort. You can solve your problem much more simply:

df["sunset_deg"] = df[["earth_sunset_deg", "surface_sunset_deg"]].min(1)

Here's an alternative which might be more easily extended to different conditions:

df["sunset_deg"] = df["earth_sunset_deg"].where(df["surface_sunset_deg"] > df["earth_sunset_deg"], df["surface_sunset_deg"])

Either of these is hugely more efficient than anything using apply() (which really is just a for loop, which is dead slow).

Sign up to request clarification or add additional context in comments.

Comments

0

The problem is that 'surface_sunset_earth' doesn't exists in the specified row. to be exact, the problem is here:

else row["surface_sunset_earth"]

you can't get the key "surface_sunset_earth" if it doesn't exists in the specified row.

Maybe you don't want to use lambda here. lambda is better for small logic, when logic gets bigger you better use a function instead.

That would be a solution:

def my_func(row):
    try:
        if row["earth_sunset_deg"] < row["surface_sunset_deg"]:
            return row["earth_sunset_deg"]  
        else:
            return row["surface_sunset_earth"]
    except KeyError:
        # Decide here what to do in case one of the keys aren't exists
        pass

df["sunset_deg"] = df[:30].apply(my_func, axis=1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.