how to get correct correlation plot on time series data with matplotlib/seaborn?

Question

I have time-series data that collected weekly basis, where I want to see the correlation of its two columns. to do so, I could able to find a correlation between two columns and want to see how rolling correlation moves each year. my current approach works fine but I need to normalize the two columns before doing rolling correlation and making a line plot. In my current attempt, I don't know how to show 3-year, 5 year rolling correlation. Can anyone suggest a possible idea of doing this in matplotlib?

current attempt:

Here is my current attempt:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

dataPath="https://gist.github.com/jerry-shad/503a7f6915b8e66fe4a0afbc52be7bfa#file-sample_data-csv"

def ts_corr_plot(dataPath, roll_window=4):
    df = pd.read_csv(dataPath)
    df['Date'] = pd.to_datetime(df['Date'])
    df['week'] = pd.DatetimeIndex(df['date']).week
    df['year'] = pd.DatetimeIndex(df['date']).year
    df['week'] = df['date'].dt.strftime('%W').astype('uint8')
    
    def find_corr(x):
        df = df.loc[x.index]
        return df[:, 1].corr(df[:, 2])
    
    df['corr'] = df['week'].rolling(roll_window).apply(find_corr)
    fig, ax = plt.subplots(figsize=(7, 4), dpi=144)
    sns.lineplot(x='week', y='corr', hue='year', data=df,alpha=.8)
    plt.show()
    plt.close

update:

I want to see rolling correlation in different time window such as:

plt_1 = ts_corr_plot(dataPath, roll_window=4)
plt_2 = ts_corr_plot(dataPath, roll_window=12)
plt_3 = ts_corr_plot(dataPath, roll_window=24)

I need to add 3-years, 5-years rolling correlation to the plots but I couldn't find a better way of doing this. Can anyone point me out how to make a rolling correlation line plot for time series data? How can I improve the current attempt? any idea?

desired plot

this is my expected plot that I want to obtain:

r-beginners · Accepted Answer · 2021-03-11 14:17:58Z

1

Customizing the legend in esaborn is painstaking, so I created the code in matplotlib.

Corrected the method for calculating the correlation coefficient. Your code gave me an error, so please correct me if I'm wrong.
The color of the line graph seems to be the color of the tableau from the desired graph color, so I used the 10 colors of the tableau defined in matplotlib.
To calculate the correlation coefficient for 3 years, I am using 156 line units, which is 3 years of weekly data. Please correct this logic if it is wrong.
I am creating 4-week and 3-year graphs in a loop process respectively.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

dataPath="https://gist.githubusercontent.com/jerry-shad/503a7f6915b8e66fe4a0afbc52be7bfa/raw/414a2fc2988fcf0b8e6911d77cccfbeb4b9e9664/sample_data.csv"

df = pd.read_csv(dataPath)
df['Date'] = pd.to_datetime(df['Date'])
df['week'] = df['Date'].dt.isocalendar().week
df['year'] = df['Date'].dt.year
df['week'] = df['Date'].dt.strftime('%W').astype('uint8')

def find_corr(x):
    dfc = df.loc[x.index]
    tmp = dfc.iloc[:, [1,2]].corr()
    tmp = tmp.iloc[0,1]
    return tmp

roll_window=4
df['corr'] = df['week'].rolling(roll_window).apply(find_corr)
df3 = df.copy() # three year
df3['corr3'] = df3['year'].rolling(156).apply(find_corr) # 3 year = 52 week x 3 year = 156 

fig, ax = plt.subplots(figsize=(12, 4), dpi=144)
cmap = plt.get_cmap("tab10")

for i,y in enumerate(df['year'].unique()):
    tmp = df[df['year'] == y]
    ax.plot(tmp['week'], tmp['corr'], color=cmap(i), label=y)

for i,y in enumerate(df['year'].unique()):
    tmp = df3[df3['year'] == y]
    if tmp['corr3'].notnull().all():
        ax.plot(tmp['week'], tmp['corr3'], color=cmap(i), lw=3, linestyle='--', label=str(y)+' 3 year avg')

ax.grid(axis='both')
ax.legend(loc='upper left', bbox_to_anchor=(1.0, 1.0), borderaxespad=1)
plt.show()
# plt.close

edited Mar 11, 2021 at 14:17

answered Mar 11, 2021 at 7:34

r-beginners

35.6k3 gold badges20 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Hamilton Over a year ago

I have a couple of questions to clarify before accepting the posted answer; can we wrap above attempt in function, so if we choose roll_window=4 or 12, 24, and do df3['corr3'] = df3['year'].rolling(#_of_years * #_of_weeks).apply(find_corr)? Plus, to make things easy, can we either get a 3-year or 5-year avg of rolling correlation in the plot (so we can make argument in the function)? Do you think can we make above attempt better and simpler? Thanks again for your help!

Hamilton Over a year ago

also, why find_corr twice for getting n_year rolling correlation average? Do you mind if above attempt can be updated? Thanks!

r-beginners Over a year ago

I noticed it later and couldn't fix it in time. You can just add a column for each correlation coefficient in one data frame as you pointed out. I can also calculate the mean with df['corr_5year'].mean().

r-beginners Over a year ago

The explanation was wrong. The reason why I use it twice is because one is to calculate the correlation coefficient in weeks and the other is to calculate the correlation coefficient in years. My earlier comment meant that there was no need to split the data frame into two.

r-beginners Over a year ago

df3['corr5'] = df3['year'].rolling(260).apply(find_corr) # 5 year = 52 week x 5 year = 260;df3['corr5_mean']=df3['corr5'].mean()

We can add this and use the same technique to draw it.

|

Collectives™ on Stack Overflow

how to get correct correlation plot on time series data with matplotlib/seaborn?

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related