Python pandas: how to fill values between existing ones in dataframe column?

Question

I have a pandas DataFrame with 3 columns. The first column contains string values in ascending order, at a certain frequency (e.g. '20173070000', '20173070020', '20173070040', etc.). The second and third columns contain corresponding integer values. I would like to re-sample the first column to every one - '20173070000', '20173070001', '20173070002', simultaneously filling the second and third columns with NaN values, and then I would like to interpolate those NaN values.

I've looked into re-sampling data, but this appears to only work for timedate values. I have also looked into pd.interpolate, but this appears to work for interpolating between missing values. As stated above, my dataset does not contain missing data. I am simply looking to increase the frequency of my entries - to fill between existing values.

To give some reference, my current DataFrame looks like this:

         0             1             2
0      20173070000    14.0          13.9
1      20173070020    14.1          14.1
2      20173070040    13.8          13.6
3      20173070060    13.7          13.7
4      20173070080    13.8          13.5
5      20173070100    13.9          14.0

I would like to generate a DataFrame that looks like:

         0             1             2
0      20173070000    14.0          13.9
1      20173070001    NaN            NaN
2      20173070002    NaN            NaN
3      20173070003    NaN            NaN
4      20173070004    NaN            NaN
5      20173070005    NaN            NaN
...
20     20173070020    14.1           14.1
21     20173070021    NaN            NaN
...

I have no problem sorting the interpolation afterwards, but I have not worked out how to up sample yet.

Mara · Accepted Answer · 2019-06-14 11:39:38Z

10

You can just use reindex function. By default, it places NaN in locations having no value in the "new" index.

df = pd.DataFrame({'A': [20173070000, 20173070020, 20173070040, 20173070060, 20173070080, 20173070100 ], 
                  'B': [14, 14.1, 13.8, 13.7, 13.8, 13.9],
                  'C': [13.9, 14.1, 13.6, 13.7, 13.5, 14.0]  })

df.set_index('A').reindex(np.arange(np.min(df.A), np.max(df.A)+1)  ).reset_index()

answered Jun 14, 2019 at 11:39

Mara

9152 gold badges11 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

bexi · Accepted Answer · 2019-06-14 11:46:27Z

0

I believe the interpolate() is the way to go for you. After having upsampled as you described and given the column containing the values you want to interpolate is called 'val1', you can do:

df.loc[:, 'val1'] = df.loc[:, 'val1'].interpolate()

answered Jun 14, 2019 at 11:46

bexi

1,2167 silver badges9 bronze badges

Collectives™ on Stack Overflow

Python pandas: how to fill values between existing ones in dataframe column?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related