Adding missing time in pandas dataframe

Question

I have a pandas.DataFrame with timestamps in a column. The values are in epoch and 0.1 seconds apart. Values like 1488771900.100000, 1488771900.200000 and so on. However, there are missing values. So I have 1488794389.500000 and then 1488794389.900000 with 3 missing values between. I want to insert rows in the dataframe with missing values between the max and min in this column. So if the min is 1488771900.000000 and max is 1488794660.000000, I want to insert rows with all values separated by 0.1 seconds and NA in all other columns.

I saw an answer in this link, but wasn't able to replicate the same.

How can this operation be performed?

that linked post should've worked, please post raw data, code to create the df, your attempts and any errors — EdChum
– EdChum, Commented Mar 8, 2017 at 13:52

Stephen Rauch · Accepted Answer · 2017-03-09 21:21:51Z

3

You can fill in your missing times using pandas.DataFrame.resample. The caveat is that the dataframe needs to have a pandas.DateTimeIndex. In your case the time is likely stored as a float in seconds since epoch, and this needs to be converted prior to re-sampling. Here is a function which will perform that operation.

Code:

import datetime as dt
import pandas as pd

def resample(dataframe, time_column, sample_period):
    # make a copy of the dataframe
    dataframe = dataframe.copy()

    # convert epoch times to datetime
    dataframe.time = dataframe.time.apply(
        lambda ts: dt.datetime.fromtimestamp(ts))

    # make the datetimes into an index
    dataframe.set_index(time_column, inplace=True)

    # resample to desired period
    dataframe = dataframe.resample(sample_period).asfreq().reset_index()

    # convert datetimes back to epoch
    epoch = dt.datetime.fromtimestamp(0)
    dataframe.time = dataframe.time.apply(
        lambda ts: (ts - epoch).total_seconds())
    return dataframe

Test Code:

values = [
    (1488771900.10, 'a'),
    (1488771900.20, 'b'),
    (1488771900.30, 'c'),
    (1488771900.60, 'f'),
]
columns = ['time', 'value']
df = pd.DataFrame(values, columns=columns)
print(df)

new_df = resample(df, 'time', '100ms')
print(new_df)

Results:

           time value
0  1.488772e+09     a
1  1.488772e+09     b
2  1.488772e+09     c
3  1.488772e+09     f

           time value
0  1.488772e+09     a
1  1.488772e+09     b
2  1.488772e+09     c
3  1.488772e+09   NaN
4  1.488772e+09   NaN
5  1.488772e+09     f

answered Mar 9, 2017 at 21:21

Stephen Rauch♦

50.1k32 gold badges118 silver badges143 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

philshem Over a year ago

This is a good answer, but the .apply()'s may be slow on larger data frames. Instead, maybe check out df['time'] = pd.to_datetime(df['time'],unit='s') (more here)

Collectives™ on Stack Overflow

Adding missing time in pandas dataframe

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related