Why does a LSTM pytorch model yield constant values?

Question

I am training a LSTM model with data from yfinance. The process is really standard. I get the data with yf.download(ticker=ticker) where ticker='AAPL and do df.rolling(30, min_periods=1) to smooth the data. Then I adapt the data for training like this:

def create_ds_for_forecasting(df, window_range):
    df_values = df.copy()
    X, y = [], []
    for i in np.arange(0, len(df_values)-window_range-1):
        X.append(df_values[i:i+window_range])
        y.append(df_values[i+1:i+window_range+1])
    return torch.Tensor(np.array(X)).to(device), torch.Tensor(np.array(y)).to(device)

Next, I train the following model using nn.SmoothL1Loss as criterion and Adam as optimizer.

from torch import nn

class ModeloLSTM(nn.Module):
    
    def __init__(self, num_layers, hidden_size, input_size, batch_size):
        super().__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.batch_size = batch_size
        self.lstm = nn.LSTM(
            input_size=self.input_size,
            num_layers=self.num_layers,
            hidden_size=self.hidden_size,
            batch_first=True
        ).to(device)
        self.fc = nn.Linear(hidden_size, 1).to(device)
        self.tanh = nn.Tanh()

        
    def forward(self, x):
        # Dynamically initialize hidden state per batch
        if self.batch_size != 0:
            h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size, device=x.device)
            c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size, device=x.device)
        elif self.batch_size == 0:
            h0 = torch.zeros(self.num_layers, self.hidden_size, device=x.device)
            c0 = torch.zeros(self.num_layers, self.hidden_size, device=x.device)
        out, _ = self.lstm(x, (h0, c0))
        out = self.fc(out)  # last timestep [:, -1, :]
        out = self.tanh(out)
        return out

Everything turns normal. And these are the train + test results.

If you are wondering whether I trained with test data as well, I didn't. These are the train and test loops.

## TRAIN LOOP

loader = DataLoader(TensorDataset(X_train, y_train), shuffle=True, batch_size=64, drop_last=True)

num_epochs = 5
for epoch in range(num_epochs):
    for inputs, label in loader:
        outputs = modelo(inputs)
        loss = criterion(outputs, label)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

## TEST LOOP

y_pred = []
i = 0
loader = DataLoader(X_test, batch_size=batch_size)
with torch.no_grad():
    for x_batch in loader:
        #for i in range(0, X_train.shape[0], batch_size):
        #x_batch, y_batch = X_train[i:i+batch_size,:,:], y_train[i:i+batch_size,:]
        y_pred_i = modelo(x_batch)[:, -1, :]
        y_pred.append(y_pred_i)
        
y_pred = torch.cat(y_pred, axis=0)

Now, here comes the issue. I save the model weights and load them on a new unbatched instance of the original model where c0 and h0 have shapes of (num_layers, hidden_szie), all by using model.load_state_dict(modelo.state_dict()) where model has batch size equal to zero. Then, I use this loop to make predictios for the future.

days_to_simulate = 3*3*window_range # 3 months
input_data = df_test[-window_range:]
input_data = torch.Tensor(input_data).to(DEVICE)

model = ModeloLSTM(num_layers=1, hidden_size=50, input_size=1, batch_size=0)
model.load_state_dict(modelo.state_dict())
model.eval()

with torch.no_grad():
    seq_prediction = torch.Tensor(model(input_data))[-1,:].unsqueeze(-1).to(DEVICE)
    
    for i in range(0, days_to_simulate):
        if i < window_range:
            input_data = torch.cat((input_data[-window_range+i:,:], seq_prediction), dim=0)
        elif i >= window_range:
            input_data = seq_prediction[-window_range:]
        next_pred = torch.Tensor(model(input_data))[-1,:].unsqueeze(-1).to(DEVICE)
        seq_prediction = torch.cat((seq_prediction, next_pred), dim=0)

starting_dates = pd.date_range(start=df.index[-window_range], periods=window_range)
predicted_dates = pd.date_range(start=df.index[-1], periods=days_to_simulate+1)

starting_series = pd.Series(df[-window_range:].values.flatten(), index=starting_dates)
predicted_series = pd.Series(scaler.inverse_transform(seq_prediction.detach().cpu().numpy()).flatten(), index=predicted_dates)


plt.figure(figsize=(12, 6))
plt.plot(starting_series.index, starting_series.values.flatten(), linestyle='-', label='Datos Reales')
plt.plot(predicted_series.index, predicted_series.values.flatten(), linestyle='-', label='Predicción')
plt.title('Predicción de Precios de Acciones')
plt.xlabel('Fecha')
plt.ylabel('Precio')
plt.legend()
plt.show()

However, for some unknown reason, this new unbatched model with the exact same weights converges to a stationary value like shown in this resulting plot.

Final results, the predicted values in orange seem to always converge to a fixed value. Why is this? The batching should not affect the performance of a model like this. By the way, I am actually doing this process to deploy the model in a custom personal app by downloading the weights, instantiating a new model class and finally reload the original model state into the newer one.

EDIT

I realized, maybe a bit too late, that I am using x.size(0) to define the batch size in h0 and c0, which means the batch size is inferred and if I fed the network with a data of different batch size, it would still work. Meaning that I could feed data with shape (1, window_range, features). However, that doesn't explain why it stabilizes into this a constante value. It seems to me that this might be related to the fact that I am doing future predictions with the models own predictions.

That changes the question. How can I avoid a LSTM from stabilizing itself onto a singular constant value when using their own predictions?

Sounds a bit like an XY problem to me. Why do you want an unbatched model? — xdurch0
– xdurch0, Commented Oct 8 at 6:58
Because, if I am trying to make predictions about the next day using the last recorded ten days, as an example, I have unbatched data that I want to feed to a batched model. It will simply throw a shape mismatch error. I've been told to use dummy data and just keep the data I want, but I am not sure about this suggestion either. — franjefriten
– franjefriten, Commented Oct 8 at 15:53
You can just add a batch axis of size 1 in front. Like [1, T, d] where T is the time steps and d the feature dimension. — xdurch0
– xdurch0, Commented Oct 8 at 16:33
Mostly because it is annoying and shouldn't be necessary. Getting the state_dict and being able to transport it into a newer model is something I've done in the past, and I think that is what causing my weird predictions. — franjefriten
– franjefriten, Commented Oct 9 at 16:36

franjefriten · Accepted Answer · 2025-10-10 16:00:23Z

1

After doing a lot of research, I realized that the issue has to do with the use of LSTM.

LSTM and RNN are critized for begin bad precisely at predicting future values of a sequence and often used for predicting intermediate values in voice recognition or sentiment analysis.

Futher research showed me that, for forecasting, it is recommended to use Seq2Seq models like an LSTM encoder-to-decoder or attention based models that don't rely on autoregression.

answered Oct 10 at 16:00

franjefriten

521 silver badge9 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Why does a LSTM pytorch model yield constant values?

EDIT

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

EDIT

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related