2

I have a pandas.DataFrame with timestamps in a column. The values are in epoch and 0.1 seconds apart. Values like 1488771900.100000, 1488771900.200000 and so on. However, there are missing values. So I have 1488794389.500000 and then 1488794389.900000 with 3 missing values between. I want to insert rows in the dataframe with missing values between the max and min in this column. So if the min is 1488771900.000000 and max is 1488794660.000000, I want to insert rows with all values separated by 0.1 seconds and NA in all other columns.

I saw an answer in this link, but wasn't able to replicate the same.

How can this operation be performed?

1
  • that linked post should've worked, please post raw data, code to create the df, your attempts and any errors Commented Mar 8, 2017 at 13:52

1 Answer 1

3

You can fill in your missing times using pandas.DataFrame.resample. The caveat is that the dataframe needs to have a pandas.DateTimeIndex. In your case the time is likely stored as a float in seconds since epoch, and this needs to be converted prior to re-sampling. Here is a function which will perform that operation.

Code:

import datetime as dt
import pandas as pd

def resample(dataframe, time_column, sample_period):
    # make a copy of the dataframe
    dataframe = dataframe.copy()

    # convert epoch times to datetime
    dataframe.time = dataframe.time.apply(
        lambda ts: dt.datetime.fromtimestamp(ts))

    # make the datetimes into an index
    dataframe.set_index(time_column, inplace=True)

    # resample to desired period
    dataframe = dataframe.resample(sample_period).asfreq().reset_index()

    # convert datetimes back to epoch
    epoch = dt.datetime.fromtimestamp(0)
    dataframe.time = dataframe.time.apply(
        lambda ts: (ts - epoch).total_seconds())
    return dataframe

Test Code:

values = [
    (1488771900.10, 'a'),
    (1488771900.20, 'b'),
    (1488771900.30, 'c'),
    (1488771900.60, 'f'),
]
columns = ['time', 'value']
df = pd.DataFrame(values, columns=columns)
print(df)

new_df = resample(df, 'time', '100ms')
print(new_df)

Results:

           time value
0  1.488772e+09     a
1  1.488772e+09     b
2  1.488772e+09     c
3  1.488772e+09     f

           time value
0  1.488772e+09     a
1  1.488772e+09     b
2  1.488772e+09     c
3  1.488772e+09   NaN
4  1.488772e+09   NaN
5  1.488772e+09     f
Sign up to request clarification or add additional context in comments.

1 Comment

This is a good answer, but the .apply()'s may be slow on larger data frames. Instead, maybe check out df['time'] = pd.to_datetime(df['time'],unit='s') (more here)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.