1

If I have df with a column like 'B' below, but I want to change the frequency of the column B, to 1 business day instead of 3 business days, so that the output df will have 7 entries, instead of 3; how to use .offsets() or Dateoffset() to do this please?

df = pd.DataFrame(
{
    'A': np.random.randint(0, 20, size = 3), 
    'B': pd.date_range('1/1/2010', periods = 3, freq = '3B')
}
)
    
df['A'] = df['A'].astype('Int64')
0

2 Answers 2

1

Option 1

df.set_index + df.asfreq + df.reset_index + df.loc to re-order. (This will get you 7 business days, because of initial min and max for column 'B'.)

out = df.set_index('B').asfreq('1B').reset_index().loc[:, [*'AB']]

Output:

# with `np.random.seed(0)` for reproducibility

      A          B
0    12 2010-01-01
1  <NA> 2010-01-04
2  <NA> 2010-01-05
3    15 2010-01-06
4  <NA> 2010-01-07
5  <NA> 2010-01-08
6     0 2010-01-11

Option 2

pd.date_range + df.reindex + df.rename_axis + reset_index + df.loc.

reindex = pd.date_range(df['B'].min(), periods=7, freq='1B')

out2 = (
    df.set_index('B')
    .reindex(reindex)
    .rename_axis('B')
    .reset_index().loc[:, [*'AB']]
    )

Option 3

df.merge

out3 = df.merge(pd.DataFrame({'B': reindex}), on='B', how='right')

Option 4

df.resample + resample.asfreq (More useful if you want to fill_value or instead use resample.interpolate.)

out4 = df.set_index('B').resample('1B').asfreq().reset_index().loc[:, [*'AB']]

Equality check:

dfs = [out, out2, out3, out4]
all(df.equals(dfs[0]) for df in dfs)

# True
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you :-) But one final question please (Sorry, I'm still studying datetime in pandas): - So can we say that in order to change the datetime freq for a pandas' df col, it'd be easier to change the col to an index, then reset the index again, correct? When I set the col 'B' as an index, I was very easily able to change it using: df = df.asfreq('B')
Yes, that's correct. In general, when working with time series data, it's convenient to use a datetime index. This makes filtering, slicing, etc. much easier. See Time series / date functionality and How to handle time series data with ease, specifically Datetime as index.
1

One method without using the index and by linearly interpolating on the additional days:

display(df)

df

df2 = pd.DataFrame({
    'A': None, 
    'B': pd.date_range('1/1/2010', periods = 7, freq = '1B') })

dico = {b:a for (a, b) in zip(df["A"], df["B"])}

df2["A"] = df2.apply(lambda row: dico.get(row["B"], np.nan) , axis=1).interpolate()
display(df2)

df2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.