1

I have a dataframe with two columns, seconds passed and a value. In the seconds passed row, the dataframe sometimes skips a second (data missing). I would like to fill in the missing seconds and intrapolate the missing value.

What I have tried so far is taking the first and last measurement of the dataframe, and arange a Numpy array containing all seconds passed from start to finish, converted this into a dataframe matching the first and tried to join or merge them.

The original df looks like this:

   seconds   value
0     1        5.560000
1     3        5.590000
2     4        5.620000
3     5        5.646667
4     7        5.653333
5     9        5.760000

I then create another dataframe, df2:

   seconds   value
0     1        NaN
1     2        NaN
2     3        NaN
3     4        NaN
4     5        NaN
5     6        NaN
6     7        NaN
7     8        NaN
8     9        NaN

The I tried merging them together, both ways around like so

df = df.merge(df2, how='left')

What I expect the output to be is

   seconds   value
0     1        5.560000
1     2        NaN
2     3        5.590000
3     4        5.620000
4     5        5.646667
5     6        NaN
6     7        5.653333
7     8        NaN
8     9        5.760000

but the actual output is either df or df2, unmerged. Is there a way to achieve the expected result, and am I on the right track or could this be done much more easily?

1
  • Try df.merge(df2, how='outer'). Outer merge: "use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically". Commented Apr 16, 2019 at 15:31

2 Answers 2

3

You dont need the second df, use df.reindex():

df=df.set_index('seconds')
df=df.reindex(range(df.index.min(),df.index.max()+1)).reset_index()

If using the second df is necessary, you can use:

df=df.set_index('seconds').combine_first(df2.set_index('seconds')).reset_index()

   seconds     value
0        1  5.560000
1        2       NaN
2        3  5.590000
3        4  5.620000
4        5  5.646667
5        6       NaN
6        7  5.653333
7        8       NaN
8        9  5.760000
Sign up to request clarification or add additional context in comments.

Comments

2

I am using update and fix your problem

df1.set_index('seconds',inplace=True)
df2.set_index('seconds',inplace=True)
df2.update(df1)
df2.reset_index(inplace=True)

1 Comment

Both answers work, accepted the first answer for eliminating the need to have an intermediary data frame.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.