I am training a LSTM model with data from yfinance. The process is really standard. I get the data with yf.download(ticker=ticker) where ticker='AAPL and do df.rolling(30, min_periods=1) to smooth the data. Then I adapt the data for training like this:
def create_ds_for_forecasting(df, window_range):
df_values = df.copy()
X, y = [], []
for i in np.arange(0, len(df_values)-window_range-1):
X.append(df_values[i:i+window_range])
y.append(df_values[i+1:i+window_range+1])
return torch.Tensor(np.array(X)).to(device), torch.Tensor(np.array(y)).to(device)
Next, I train the following model using nn.SmoothL1Loss as criterion and Adam as optimizer.
from torch import nn
class ModeloLSTM(nn.Module):
def __init__(self, num_layers, hidden_size, input_size, batch_size):
super().__init__()
self.input_size = input_size
self.hidden_size = hidden_size
self.num_layers = num_layers
self.batch_size = batch_size
self.lstm = nn.LSTM(
input_size=self.input_size,
num_layers=self.num_layers,
hidden_size=self.hidden_size,
batch_first=True
).to(device)
self.fc = nn.Linear(hidden_size, 1).to(device)
self.tanh = nn.Tanh()
def forward(self, x):
# Dynamically initialize hidden state per batch
if self.batch_size != 0:
h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size, device=x.device)
c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size, device=x.device)
elif self.batch_size == 0:
h0 = torch.zeros(self.num_layers, self.hidden_size, device=x.device)
c0 = torch.zeros(self.num_layers, self.hidden_size, device=x.device)
out, _ = self.lstm(x, (h0, c0))
out = self.fc(out) # last timestep [:, -1, :]
out = self.tanh(out)
return out
Everything turns normal. And these are the train + test results.
If you are wondering whether I trained with test data as well, I didn't. These are the train and test loops.
## TRAIN LOOP
loader = DataLoader(TensorDataset(X_train, y_train), shuffle=True, batch_size=64, drop_last=True)
num_epochs = 5
for epoch in range(num_epochs):
for inputs, label in loader:
outputs = modelo(inputs)
loss = criterion(outputs, label)
optimizer.zero_grad()
loss.backward()
optimizer.step()
## TEST LOOP
y_pred = []
i = 0
loader = DataLoader(X_test, batch_size=batch_size)
with torch.no_grad():
for x_batch in loader:
#for i in range(0, X_train.shape[0], batch_size):
#x_batch, y_batch = X_train[i:i+batch_size,:,:], y_train[i:i+batch_size,:]
y_pred_i = modelo(x_batch)[:, -1, :]
y_pred.append(y_pred_i)
y_pred = torch.cat(y_pred, axis=0)
Now, here comes the issue. I save the model weights and load them on a new unbatched instance of the original model where c0 and h0 have shapes of (num_layers, hidden_szie), all by using model.load_state_dict(modelo.state_dict()) where model has batch size equal to zero. Then, I use this loop to make predictios for the future.
days_to_simulate = 3*3*window_range # 3 months
input_data = df_test[-window_range:]
input_data = torch.Tensor(input_data).to(DEVICE)
model = ModeloLSTM(num_layers=1, hidden_size=50, input_size=1, batch_size=0)
model.load_state_dict(modelo.state_dict())
model.eval()
with torch.no_grad():
seq_prediction = torch.Tensor(model(input_data))[-1,:].unsqueeze(-1).to(DEVICE)
for i in range(0, days_to_simulate):
if i < window_range:
input_data = torch.cat((input_data[-window_range+i:,:], seq_prediction), dim=0)
elif i >= window_range:
input_data = seq_prediction[-window_range:]
next_pred = torch.Tensor(model(input_data))[-1,:].unsqueeze(-1).to(DEVICE)
seq_prediction = torch.cat((seq_prediction, next_pred), dim=0)
starting_dates = pd.date_range(start=df.index[-window_range], periods=window_range)
predicted_dates = pd.date_range(start=df.index[-1], periods=days_to_simulate+1)
starting_series = pd.Series(df[-window_range:].values.flatten(), index=starting_dates)
predicted_series = pd.Series(scaler.inverse_transform(seq_prediction.detach().cpu().numpy()).flatten(), index=predicted_dates)
plt.figure(figsize=(12, 6))
plt.plot(starting_series.index, starting_series.values.flatten(), linestyle='-', label='Datos Reales')
plt.plot(predicted_series.index, predicted_series.values.flatten(), linestyle='-', label='Predicción')
plt.title('Predicción de Precios de Acciones')
plt.xlabel('Fecha')
plt.ylabel('Precio')
plt.legend()
plt.show()
However, for some unknown reason, this new unbatched model with the exact same weights converges to a stationary value like shown in this resulting plot.
Final results, the predicted values in orange seem to always converge to a fixed value. Why is this? The batching should not affect the performance of a model like this. By the way, I am actually doing this process to deploy the model in a custom personal app by downloading the weights, instantiating a new model class and finally reload the original model state into the newer one.
EDIT
I realized, maybe a bit too late, that I am using x.size(0) to define the batch size in h0 and c0, which means the batch size is inferred and if I fed the network with a data of different batch size, it would still work. Meaning that I could feed data with shape (1, window_range, features). However, that doesn't explain why it stabilizes into this a constante value. It seems to me that this might be related to the fact that I am doing future predictions with the models own predictions.
That changes the question. How can I avoid a LSTM from stabilizing itself onto a singular constant value when using their own predictions?

