0

I have one dataframe (df1) like the following:

     ATime           ETime     Difference
0   1444911017815         1588510  1444909429305
1   1444911144979         1715672  1444909429307
2   1444911285683         1856374  1444909429309
3   1444911432742         2003430  1444909429312
4   1444911677101         2247786  1444909429315
5   1444912444821         3015493  1444909429328
6   1444913394542         3965199  1444909429343
7   1444913844134         4414784  1444909429350
8   1444914948835         5519467  1444909429368
9   1444915840638         6411255  1444909429383
10  1444916566634         7137240  1444909429394
11  1444917379593         7950186  1444909429407

I have another very big dataframe (df2) which has a column named Absolute_Time. Absolute_Time has the format as ATime of df1. So what I want to do is, for example, for all Absolute_Time's that lay in the range of row 0 to row 1 of ETime of df1, I want to subtract row 0 of Difference of df1 and so on.

2
  • I do not understand the explanation of what you are trying to achieve ... Commented Dec 17, 2015 at 21:49
  • Show us what you have tried so far and we will probably be able to help you. Commented Dec 17, 2015 at 21:50

1 Answer 1

1

Here's an attempt to accomplish what you might be looking for, starting with:

print(df1)

            ATime    ETime     Difference
0   1444911017815  1588510  1444909429305
1   1444911144979  1715672  1444909429307
2   1444911285683  1856374  1444909429309
3   1444911432742  2003430  1444909429312
4   1444911677101  2247786  1444909429315
5   1444912444821  3015493  1444909429328
6   1444913394542  3965199  1444909429343
7   1444913844134  4414784  1444909429350
8   1444914948835  5519467  1444909429368
9   1444915840638  6411255  1444909429383
10  1444916566634  7137240  1444909429394
11  1444917379593  7950186  1444909429407

next creating a new DataFrame with random times within the range of df1:

df2 = pd.DataFrame({'Absolute Time':[randrange(start=df1.ATime.iloc[0], stop=df1.ATime.iloc[-1]) for i in range(100)]})
df2 = df2.sort_values('Absolute Time').reset_index(drop=True)

np.searchsorted provides you with the index positions where df2 should be inserted in df1 (for the columns in question):

df2.index = np.searchsorted(df1.ATime.values, df2.loc[:, 'Absolute Time'].values)

Assigning the new index and merging produces a new DataFrame. Filling the missing Difference values forward allows to subtract in the next step:

df = pd.merge(df1, df2, left_index=True, right_index=True, how='left').fillna(method='ffill').dropna().astype(int)
df['Absolute Time Adjusted'] = df['Absolute Time'].sub(df.Difference)
print(df.head())

           ATime    ETime     Difference  Absolute Time  \
1  1444911144979  1715672  1444909429307  1444911018916   
1  1444911144979  1715672  1444909429307  1444911138087   
2  1444911285683  1856374  1444909429309  1444911138087   
3  1444911432742  2003430  1444909429312  1444911303233   
3  1444911432742  2003430  1444909429312  1444911359690   

   Absolute Time Adjusted  
1                 1589609  
1                 1708780  
2                 1708778  
3                 1873921  
3                 1930378 
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.