1

I have time-series data that collected weekly basis, where I want to see the correlation of its two columns. to do so, I could able to find a correlation between two columns and want to see how rolling correlation moves each year. my current approach works fine but I need to normalize the two columns before doing rolling correlation and making a line plot. In my current attempt, I don't know how to show 3-year, 5 year rolling correlation. Can anyone suggest a possible idea of doing this in matplotlib?

current attempt:

Here is my current attempt:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

dataPath="https://gist.github.com/jerry-shad/503a7f6915b8e66fe4a0afbc52be7bfa#file-sample_data-csv"

def ts_corr_plot(dataPath, roll_window=4):
    df = pd.read_csv(dataPath)
    df['Date'] = pd.to_datetime(df['Date'])
    df['week'] = pd.DatetimeIndex(df['date']).week
    df['year'] = pd.DatetimeIndex(df['date']).year
    df['week'] = df['date'].dt.strftime('%W').astype('uint8')
    
    def find_corr(x):
        df = df.loc[x.index]
        return df[:, 1].corr(df[:, 2])
    
    df['corr'] = df['week'].rolling(roll_window).apply(find_corr)
    fig, ax = plt.subplots(figsize=(7, 4), dpi=144)
    sns.lineplot(x='week', y='corr', hue='year', data=df,alpha=.8)
    plt.show()
    plt.close

update:

I want to see rolling correlation in different time window such as:

plt_1 = ts_corr_plot(dataPath, roll_window=4)
plt_2 = ts_corr_plot(dataPath, roll_window=12)
plt_3 = ts_corr_plot(dataPath, roll_window=24)

I need to add 3-years, 5-years rolling correlation to the plots but I couldn't find a better way of doing this. Can anyone point me out how to make a rolling correlation line plot for time series data? How can I improve the current attempt? any idea?

desired plot

this is my expected plot that I want to obtain:

enter image description here

1 Answer 1

1

Customizing the legend in esaborn is painstaking, so I created the code in matplotlib.

  1. Corrected the method for calculating the correlation coefficient. Your code gave me an error, so please correct me if I'm wrong.
  2. The color of the line graph seems to be the color of the tableau from the desired graph color, so I used the 10 colors of the tableau defined in matplotlib.
  3. To calculate the correlation coefficient for 3 years, I am using 156 line units, which is 3 years of weekly data. Please correct this logic if it is wrong.
  4. I am creating 4-week and 3-year graphs in a loop process respectively.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

dataPath="https://gist.githubusercontent.com/jerry-shad/503a7f6915b8e66fe4a0afbc52be7bfa/raw/414a2fc2988fcf0b8e6911d77cccfbeb4b9e9664/sample_data.csv"

df = pd.read_csv(dataPath)
df['Date'] = pd.to_datetime(df['Date'])
df['week'] = df['Date'].dt.isocalendar().week
df['year'] = df['Date'].dt.year
df['week'] = df['Date'].dt.strftime('%W').astype('uint8')

def find_corr(x):
    dfc = df.loc[x.index]
    tmp = dfc.iloc[:, [1,2]].corr()
    tmp = tmp.iloc[0,1]
    return tmp

roll_window=4
df['corr'] = df['week'].rolling(roll_window).apply(find_corr)
df3 = df.copy() # three year
df3['corr3'] = df3['year'].rolling(156).apply(find_corr) # 3 year = 52 week x 3 year = 156 

fig, ax = plt.subplots(figsize=(12, 4), dpi=144)
cmap = plt.get_cmap("tab10")

for i,y in enumerate(df['year'].unique()):
    tmp = df[df['year'] == y]
    ax.plot(tmp['week'], tmp['corr'], color=cmap(i), label=y)

for i,y in enumerate(df['year'].unique()):
    tmp = df3[df3['year'] == y]
    if tmp['corr3'].notnull().all():
        ax.plot(tmp['week'], tmp['corr3'], color=cmap(i), lw=3, linestyle='--', label=str(y)+' 3 year avg')

ax.grid(axis='both')
ax.legend(loc='upper left', bbox_to_anchor=(1.0, 1.0), borderaxespad=1)
plt.show()
# plt.close

enter image description here

Sign up to request clarification or add additional context in comments.

6 Comments

I have a couple of questions to clarify before accepting the posted answer; can we wrap above attempt in function, so if we choose roll_window=4 or 12, 24, and do df3['corr3'] = df3['year'].rolling(#_of_years * #_of_weeks).apply(find_corr)? Plus, to make things easy, can we either get a 3-year or 5-year avg of rolling correlation in the plot (so we can make argument in the function)? Do you think can we make above attempt better and simpler? Thanks again for your help!
also, why find_corr twice for getting n_year rolling correlation average? Do you mind if above attempt can be updated? Thanks!
I noticed it later and couldn't fix it in time. You can just add a column for each correlation coefficient in one data frame as you pointed out. I can also calculate the mean with df['corr_5year'].mean().
The explanation was wrong. The reason why I use it twice is because one is to calculate the correlation coefficient in weeks and the other is to calculate the correlation coefficient in years. My earlier comment meant that there was no need to split the data frame into two.
df3['corr5'] = df3['year'].rolling(260).apply(find_corr) # 5 year = 52 week x 5 year = 260;df3['corr5_mean']=df3['corr5'].mean() We can add this and use the same technique to draw it.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.