1

I'm using Pandas to process huge time series dataset. I would like add row between the rows in the dataframe if the difference between two consecutive indexes is greater than 5.

Actual:

            a  result
Date                 
1497544649  1     1.0
1497544652  9     1.0
1497544661  9     NaN

Expected:

            a  result
Date                 
1497544649  1     1.0
1497544652  9     1.0
1497544657  9     0
1497544661  9     NaN

I used diff() on index to get difference between two consecutive indexes but not sure how to insert record if the difference is greater than 5.

import pandas as pd

df = pd.DataFrame([{"Date": 1497544649,"a":1, "result": 1}, 
                   {"Date": 1497544652,"a": 9, "result": 1},
                   {"Date": 1497544661,"a": 9, "result": 1}])
df.set_index("Date", inplace=True)

df.index.to_series().diff().fillna(0).to_frame("diff")

Any pointers on how to achieve this would be appreciated

Thank you

1 Answer 1

1

You have a head start. Add a diff column to allow for easier filtering.

Get indexes for data frames matching your rule and insert your row.

df['diff'] = df.index.to_series().diff().fillna(0).to_frame("diff")

matches = df[df['diff'] > 5].index.tolist()


for i in matches:
    diff = df.loc[i]['diff']
    interval = round(diff/2) # index some place in the middle
    df.loc[i-interval] = [0, 0, 0, diff-interval] # insert row before matched index
    df.loc[i]['diff'] = interval # may not need to update the interval

df.sort_index(inplace=False) # pandas appends by default so we should sort this

del df.diff # we can remove this 
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.