Python Pandas dataframe

Question

I have one dataframe (df1) like the following:

     ATime           ETime     Difference
0   1444911017815         1588510  1444909429305
1   1444911144979         1715672  1444909429307
2   1444911285683         1856374  1444909429309
3   1444911432742         2003430  1444909429312
4   1444911677101         2247786  1444909429315
5   1444912444821         3015493  1444909429328
6   1444913394542         3965199  1444909429343
7   1444913844134         4414784  1444909429350
8   1444914948835         5519467  1444909429368
9   1444915840638         6411255  1444909429383
10  1444916566634         7137240  1444909429394
11  1444917379593         7950186  1444909429407

I have another very big dataframe (df2) which has a column named Absolute_Time. Absolute_Time has the format as ATime of df1. So what I want to do is, for example, for all Absolute_Time's that lay in the range of row 0 to row 1 of ETime of df1, I want to subtract row 0 of Difference of df1 and so on.

I do not understand the explanation of what you are trying to achieve ... — Colonel Beauvel
– Colonel Beauvel, Commented Dec 17, 2015 at 21:49
Show us what you have tried so far and we will probably be able to help you. — cel
– cel, Commented Dec 17, 2015 at 21:50

Stefan · Accepted Answer · 2015-12-17 22:12:46Z

Here's an attempt to accomplish what you might be looking for, starting with:

print(df1)

            ATime    ETime     Difference
0   1444911017815  1588510  1444909429305
1   1444911144979  1715672  1444909429307
2   1444911285683  1856374  1444909429309
3   1444911432742  2003430  1444909429312
4   1444911677101  2247786  1444909429315
5   1444912444821  3015493  1444909429328
6   1444913394542  3965199  1444909429343
7   1444913844134  4414784  1444909429350
8   1444914948835  5519467  1444909429368
9   1444915840638  6411255  1444909429383
10  1444916566634  7137240  1444909429394
11  1444917379593  7950186  1444909429407

next creating a new DataFrame with random times within the range of df1:

df2 = pd.DataFrame({'Absolute Time':[randrange(start=df1.ATime.iloc[0], stop=df1.ATime.iloc[-1]) for i in range(100)]})
df2 = df2.sort_values('Absolute Time').reset_index(drop=True)

np.searchsorted provides you with the index positions where df2 should be inserted in df1 (for the columns in question):

df2.index = np.searchsorted(df1.ATime.values, df2.loc[:, 'Absolute Time'].values)

Assigning the new index and merging produces a new DataFrame. Filling the missing Difference values forward allows to subtract in the next step:

df = pd.merge(df1, df2, left_index=True, right_index=True, how='left').fillna(method='ffill').dropna().astype(int)
df['Absolute Time Adjusted'] = df['Absolute Time'].sub(df.Difference)
print(df.head())

           ATime    ETime     Difference  Absolute Time  \
1  1444911144979  1715672  1444909429307  1444911018916   
1  1444911144979  1715672  1444909429307  1444911138087   
2  1444911285683  1856374  1444909429309  1444911138087   
3  1444911432742  2003430  1444909429312  1444911303233   
3  1444911432742  2003430  1444909429312  1444911359690   

   Absolute Time Adjusted  
1                 1589609  
1                 1708780  
2                 1708778  
3                 1873921  
3                 1930378

Collectives™ on Stack Overflow

Python Pandas dataframe

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related