0

I have a dataframe with temperature as:

temp.ix[1:10]
                     KCRP
DateTime                 
2011-01-01 01:00:00  61.0
2011-01-01 02:00:00  60.0
2011-01-01 03:00:00  57.0
2011-01-01 04:00:00  56.0
2011-01-01 05:00:00  51.0
2011-01-01 06:00:00  55.0
2011-01-01 07:00:00  65.0
2011-01-01 08:00:00  55.0
2011-01-01 09:00:00  55.0

I have another dataframe df as:

df[['Start Time', 'End Time']].ix[1:10]
                           Start Time              End Time
DateTime                                                   
2011-01-23 05:00:00 2011-01-01 05:00:00 2011-01-01 06:11:00
2011-01-25 04:00:00 2011-01-25 04:51:00 2011-01-26 00:19:00
2011-01-26 04:00:00 2011-01-26 04:29:00 2011-01-26 23:13:00
2011-02-03 07:00:00 2011-02-03 07:56:00 2011-02-03 08:11:00
2011-02-12 19:00:00 2011-02-12 19:52:00 2011-02-13 12:14:00
2011-02-15 14:00:00 2011-02-15 14:09:00 2011-02-15 14:22:00
2011-02-22 05:00:00 2011-02-22 05:47:00 2011-02-22 05:55:00
2011-02-26 06:00:00 2011-02-26 06:47:00 2011-02-26 07:25:00
2011-03-01 00:00:00 2011-03-01 00:44:00 2011-03-02 00:11:00

For each row of df, I want to select the maximum value from temp where from temp I extract all values between and including Start Time and End Time.

So, for first row of df my answer will be as:

df[['Start Time', 'End Time']].ix[1:10]
                           Start Time              End Time   Max Temp
DateTime                                                   
2011-01-23 05:00:00 2011-01-01 05:00:00 2011-01-01 06:11:00   55

I am not sure how to proceed with this other than looping through each row of df which is probably not an interesting way to do it.

I have tried:

[np.max(temp[(temp.index >= x[0]) & (temp.index <= x[1])])['KCRP] for x in
                      zip(df['Start Time'], df['End Time'])]

1 Answer 1

1

A simple way wouold be to do this using apply:

def get_max_temp(row):
    return max(temp[(temp['DateTime'] >= row['Start_Time']) & (temp['DateTime'] <= row['End_Time'])]['KCRP'])

df['Max_Temp'] = df.apply(get_max_temp, axis=1)

You can also use a vectorized function for better performance, but explicitly iterating over rows in a dataframe should almost always be the last option.

UPDATE:

Vector version:

def get_max_temp(start, end):
    return max(temp[(temp['DateTime'] >= start) & (temp['DateTime'] <= end)]['KCRP'])

get_max_temp = np.vectorize(get_max_temp)
df['Max_Temp'] = get_max_temp(df['Start_Time'], df['End_Time'])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.