0

I have a pandas DataFrame with 3 columns. The first column contains string values in ascending order, at a certain frequency (e.g. '20173070000', '20173070020', '20173070040', etc.). The second and third columns contain corresponding integer values. I would like to re-sample the first column to every one - '20173070000', '20173070001', '20173070002', simultaneously filling the second and third columns with NaN values, and then I would like to interpolate those NaN values.

I've looked into re-sampling data, but this appears to only work for timedate values. I have also looked into pd.interpolate, but this appears to work for interpolating between missing values. As stated above, my dataset does not contain missing data. I am simply looking to increase the frequency of my entries - to fill between existing values.

To give some reference, my current DataFrame looks like this:

         0             1             2
0      20173070000    14.0          13.9
1      20173070020    14.1          14.1
2      20173070040    13.8          13.6
3      20173070060    13.7          13.7
4      20173070080    13.8          13.5
5      20173070100    13.9          14.0

I would like to generate a DataFrame that looks like:

         0             1             2
0      20173070000    14.0          13.9
1      20173070001    NaN            NaN
2      20173070002    NaN            NaN
3      20173070003    NaN            NaN
4      20173070004    NaN            NaN
5      20173070005    NaN            NaN
...
20     20173070020    14.1           14.1
21     20173070021    NaN            NaN
...

I have no problem sorting the interpolation afterwards, but I have not worked out how to up sample yet.

2 Answers 2

10

You can just use reindex function. By default, it places NaN in locations having no value in the "new" index.

df = pd.DataFrame({'A': [20173070000, 20173070020, 20173070040, 20173070060, 20173070080, 20173070100 ], 
                  'B': [14, 14.1, 13.8, 13.7, 13.8, 13.9],
                  'C': [13.9, 14.1, 13.6, 13.7, 13.5, 14.0]  })

df.set_index('A').reindex(np.arange(np.min(df.A), np.max(df.A)+1)  ).reset_index()
Sign up to request clarification or add additional context in comments.

Comments

0

I believe the interpolate() is the way to go for you. After having upsampled as you described and given the column containing the values you want to interpolate is called 'val1', you can do:

df.loc[:, 'val1'] = df.loc[:, 'val1'].interpolate()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.