I am calculating parameters for a time series and the data size is very big. How can I make my function faster? The following function, param, calculates parameters for a time series model.
Input:
import pandas as pd
import statsmodels.api as sm
data = [['01-01-2018', 150,661,396,286,786],['01-02-2018',231,341,57,768,941], ['01-03-2018',486,526,442,628,621],
['01-04-2018',279,336,140,705,184],['01-05-2018',304,137,800,94,369],['01-06-2018',919,340,372,494,117],
['01-07-2018',947,920,848,716,719],['01-08-2018',423,20,313,368,909],['01-09-2018',422,678,656,604,674],
['01-10-2018',422,678,656,604,674],['01-11-2018',337,501,743,606,991],['01-12-2018',408,536,669,903,463]]
df = pd.DataFrame(data, columns = ['date', 'A','B','C','D','E'])
df.index = df.date
def param(data_param):
w = []
x = []
y = []
z = []
p = d = q = range(0, 2) # Define the p, d and q parameters to take any value between 0 and 2
pdq = list(itertools.product(p, d, q)) # Generate all different combinations of p, q and q triplets
seasonal_pdq = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p, d, q))] # Generate all different combinations of seasonal p, q and q triplets
for i in data_param:
for param in pdq:
for param_seasonal in seasonal_pdq:
try:
results = sm.tsa.statespace.SARIMAX(data_param[i],
order=param,
seasonal_order=param_seasonal,
enforce_stationarity=False,
enforce_invertibility=False).fit()
#print(i,param, param_seasonal,results.aic)
w.append(i)
x.append(param)
y.append(param_seasonal)
z.append(results.aic)
mod_score_param = pd.DataFrame({
'id': w,
'param': x,
'param_seasonal': y,
'results_aic': z,
})
except:
continue
mod_score_param = mod_score_param.sort_values(by='results_aic')
mod_score_param = mod_score_param.dropna()
mod_score_param = mod_score_param[mod_score_param['results_aic']>3.0]
mod_score_param = mod_score_param.sort_values(['id','results_aic'])
mod_score_param = mod_score_param.drop_duplicates(['id'],keep='first')
return(mod_score_param)
Output= param(df)
Output:
+-----+----+-----------+----------------+-------------+
| | id | param | param_seasonal | results_aic |
+-----+----+-----------+----------------+-------------+
| 199 | A | (1, 0, 1) | (1, 0, 0, 12) | 4.565 |
| 197 | B | (1, 0, 0) | (1, 0, 0, 12) | 21.752 |
| 30 | C | (0, 1, 1) | (1, 0, 0, 12) | 87.847 |
| 22 | D | (1, 1, 1) | (0, 1, 0, 12) | 91.183 |
| 50 | E | (0, 1, 1) | (0, 1, 0, 12) | 92.87 |
+-----+----+-----------+----------------+-------------+


smand what output is produced by that call? Also please add the example input not (only) as a pretty but useless table, but such that it is easy for reviewers to run your code without having to type out your data. \$\endgroup\$Traceback (most recent call last): File "20190413a.py", line 6, in <module> ['01-08-2018',423,20,313,368,909],['01-09-2018',422,678,656,604,674],['01-10-2018',422,678,656,604,674]['01-11-2018',337,501,743,606,991]['01-12-2018',408,536,669,903,463]] TypeError: list indices must be integers or slices, not tuple. Regardless, I fixed the code but your program doesn't output anything like you post. Please confirm working code. \$\endgroup\$