1

I have two data frames eg.

Shorter time frame ( 4 hourly )

Time                  Data_4h
1/1/01 00:00          1.1
1/1/01 06:00          1.2
1/1/01 12:00          1.3
1/1/01 18:00          1.1
2/1/01 00:00          1.1
2/1/01 06:00          1.2
2/1/01 12:00          1.3
2/1/01 18:00          1.1
3/1/01 00:00          1.1
3/1/01 06:00          1.2
3/1/01 12:00          1.3
3/1/01 18:00          1.1

Longer time frame ( 1 day )

Time                  Data_1d
1/1/01 00:00          1.1
2/1/01 00:00          1.6
3/1/01 00:00          1.0

I want to label the shorter time frame data with the data from the longer time frame data but n-1 days, leaving NaN where the n-1 day doesn't exist.

For example,

Final merged data combining 4h and 1d

Time                  Data_4h     Data_1d
1/1/01 00:00          1.1         NaN
1/1/01 06:00          1.2         NaN
1/1/01 12:00          1.3         NaN
1/1/01 18:00          1.1         NaN
2/1/01 00:00          1.1         1.1
2/1/01 06:00          1.2         1.1
2/1/01 12:00          1.3         1.1
2/1/01 18:00          1.1         1.1 
3/1/01 00:00          1.1         1.6
3/1/01 06:00          1.2         1.6
3/1/01 12:00          1.3         1.6
3/1/01 18:00          1.1         1.6

So for 1/1 - it tried to find 31/12 but couldn't find it so it was labelled as NaN. For 2/1, it searched for 1/1 and labelled those entires with 1.1 - the value for 1/1. For 3/1, it searched for 2/1 and labelled those entires with 1.6 - the value for 2/1.

It is important to note that the timeframe datas may have large gaps. So I can't access the rows in the larger time frame directly.

What is the best way to do this?

Currently I am iterating through all the rows of the smaller timeframe and then searching for the larger time frame date using a filter like:

large_tf_data[(large_tf_data.index <= target_timestamp)][0]

Where target_timestamp is calculated on each row in the smaller time frame data frame.

This is extremely slow! Any suggestions on how to speed it up?

1
  • Are those dates dayfirst or monthfirst? Commented May 16, 2018 at 17:32

1 Answer 1

1

First, take care of dates

dayfirstme = lambda d: pd.to_datetime(d.Time, dayfirst=True)
df = df.assign(Time=dayfirstme)
df2 = df2.assign(Time=dayfirstme)

Then Convert df2 to something useful

d2 = df2.assign(Time=lambda d: d.Time + pd.Timedelta(1, 'D')).set_index('Time').Data_1d

Apply magic

df.join(df.Time.dt.date.map(d2).rename(d2.name))

                  Time  Data_4h  Data_1d
0  2001-01-01 00:00:00      1.1      NaN
1  2001-01-01 06:00:00      1.2      NaN
2  2001-01-01 12:00:00      1.3      NaN
3  2001-01-01 18:00:00      1.1      NaN
4  2001-01-02 00:00:00      1.1      1.1
5  2001-01-02 06:00:00      1.2      1.1
6  2001-01-02 12:00:00      1.3      1.1
7  2001-01-02 18:00:00      1.1      1.1
8  2001-01-03 00:00:00      1.1      1.6
9  2001-01-03 06:00:00      1.2      1.6
10 2001-01-03 12:00:00      1.3      1.6
11 2001-01-03 18:00:00      1.1      1.6

I'm sure there are other ways but I didn't want to think about this anymore.

Sign up to request clarification or add additional context in comments.

1 Comment

This is how I was going to do it myself, but I wanted to confirm if the dates were dayfirst. This confirms it though.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.