Skip to main content
We’ve updated our Terms of Service. A new AI Addendum clarifies how Stack Overflow utilizes AI interactions.
Filter by
Sorted by
Tagged with
-3 votes
0 answers
31 views

LSTM+Attention stock prediction model stops improving after 1–2 epochs despite lower learning rate - why? [closed]

I'm a student building a deep learning model for stock return prediction. I'm currently facing an issue where the validation loss stops improving after only 1–2 epochs, causing the model to barely ...
이선호's user avatar
-1 votes
0 answers
30 views

Input fusion in contextual reinforcement learning [closed]

I’m currently exploring contextual reinforcement learning for a university project. I understand that in actor–critic methods like PPO and SAC, it might be possible to combine state and contextual ...
Manu Mano's user avatar
-1 votes
0 answers
32 views

How to Improve Reconstruction Quality with VAE? [closed]

I am training a VAE architecture on microscopy images. Dataset of 1000 training images, 253 testing images. Images are resized to 128x128 input or 256x256 input from original resolution which is ...
MT0820's user avatar
  • 19
-4 votes
0 answers
38 views

Advice on Identifying Extremely Similar Objects in Real Time [closed]

I’m working on a real-time system that identifies objects, but I’m facing a challenge: new objects can look extremely similar to known ones (sometimes differences are as small as 0.1 mm), and I need ...
nour achahlaou's user avatar
1 vote
0 answers
65 views

Should I use torch.inference_mode() in a prediction method even when using model.eval()? [duplicate]

I'm following the book "Deep Learning with PyTorch Step By Step" and I have a question about the predict method in the StepByStep class (from this repository: GitHub). The current ...
Matteo's user avatar
  • 93
2 votes
1 answer
107 views

Will tf.keras.Sequential containing multiple custom layers be correctly fully serializable and deserializable in my case?

I am implementing a U-Net variant in TensorFlow/Keras with custom layers. In one of my layers custom layers UPDoubleConv, I have a Sequential self.blocks containing a repeated pattern of UpSampling2D ...
Ahmed's user avatar
  • 105
2 votes
2 answers
87 views

Decoder only model AI making repetitive responses

I am making a Decoder only transformer using Pytorch and my dataset of choice is the fullEnglish dataset from kaggle Plaintext Wikipedia (full English). The problem is that my model output is ...
Kirito's user avatar
  • 13
2 votes
1 answer
32 views

AttributeError: 'NoneType' object has no attribute 'blocks' when running Cache-DiT example with Wan2.2 model

I’m trying to use Cache-DiT to accelerate inference for the Wan2.2 model. However, when I run the example script, python run_wan_2.2_i2v.py --steps 28 --cache I get the following error. Namespace(...
傅靖茹's user avatar
-1 votes
1 answer
36 views

Pretrained ESRGAN (.pb) gives reddish or purple image — is this a preprocessing issue or model issue?

I'm trying to use a pretrained ESRGAN model that I downloaded in .pb format. The model runs without errors, but the output image has a noticeable reddish/purple tint instead of the correct colors. ...
Ahmed Almakki's user avatar
0 votes
0 answers
62 views

Utilizing GPU with RNN models which takes it's output as input [torch]

I have a machine-translation model. In this model, I calculate a vector for a given sentence and I take this vector, aggregate with each generated output of RNN and put it into RNN again for ...
cuneyttyler's user avatar
  • 1,395
1 vote
0 answers
20 views

Why does the same YOLOv8n-pose model with different weights have significantly different inference speeds?

I’m testing YOLOv8n-pose models that share the exact same architecture, input size, hardware (GPU), framework, batch size, and precision settings. The only difference between them is the trained ...
Hạnh Nhi Đỗ's user avatar
1 vote
1 answer
123 views

Torch Conv2d results in both dimensions convolved

I have input shape to a convolution (50, 1, 7617, 10). Here, 7617 is word vectors as rows, and 10 is the number of words in columns. I want to convolve column-wise and obtain (2631, 1, 7617, 1), 1 ...
cuneyttyler's user avatar
  • 1,395
0 votes
1 answer
73 views

Avoid overlap of bipartite network nodes in ggraph

I'm plotting a bipartite (two-mode) network using igraph and ggraph. But the nodes are overlapping a lot, even though there is still space in the graphic window. I would like to plot this using ggraph,...
mmmap's user avatar
  • 67
0 votes
0 answers
107 views

Kohya-SS SDXL LoRA Training Resets Steps Despite Successful State Loading

I am running SDXL LoRA training using Kohya's sd-scripts and accelerate. I have enabled --save_state and am trying to resume training, but the training steps always reset to 0, even though the log ...
Akash Chaudhari's user avatar
0 votes
0 answers
81 views

Trouble configuring R-group substitution in REINVENT 4 (AstraZeneca) — validation errors for RLConfig and ScorerConfig

I’m using AstraZeneca’s REINVENT 4 (v4.6.27) to generate SMILES from a scaffold via R-group substitution, optimizing for 5-HT2A / D2 / 5-HT1A (maximize) and minimizing H1 / M1 / α1A, with DockStream ...
Reuben Udohaya's user avatar
0 votes
1 answer
109 views

ValueError: Only instances of keras.Layer can be added to a Sequential model when using TensorFlow Hub KerasLayer

I’m trying to build a Keras Sequential model using a feature extractor from TensorFlow Hub, but I’m running into this error: ValueError: Only instances of `keras.Layer` can be added to a Sequential ...
user31600948's user avatar
0 votes
1 answer
283 views

Getting “Sizes of tensors must match” error when using ComfyUI WanVideoWrapper (wan2.2) to generate video

I am trying to generate a video using Wan 2.2. My goal is to take a motion sequence from an input video and a single reference image, and then generate a new video where the character in the reference ...
hongxigoo's user avatar
2 votes
1 answer
123 views

Keras Model throwing Error while integrating with frontend

I trained a model on Colab for my final year project EfficientNetB0. After all the layer training, I tested it and its result was excellent, but now I want to integrate the model to the frontend web ...
Narendra Patne's user avatar
0 votes
1 answer
122 views

Preventing GPU memory leak due to a custom neural network layer

I am using the MixStyle methodology for domain adaptation, and it involves using a custom layer that is inserted after every encoder stage. However, it is causing VRAM to grow linearly, which causes ...
Vedant Dalimkar's user avatar
3 votes
0 answers
77 views

Multimodel for image captioning with CNN and LSTM over flickr30k does not learn. How to fuse image features and word embeddings?

I'm working on an image captioning project using a simple CNN + LSTM architecture, as required by the course I'm studying. The full code is available here on GitHub (note: some parts are memory-...
Malihe Mahdavi sefat's user avatar
-3 votes
1 answer
93 views

Can I visualize a neural network’s loss landscape to see if it’s stuck in a bad minimum? Any code example for this? [closed]

So, I’m trying to understand why sometimes neural networks get stuck during training. I heard people talk about ‘local minima’ and ‘saddle points,’ but I can’t really picture them. I want to actually ...
prithvisyam's user avatar
0 votes
0 answers
47 views

!pip install mediapipe opencv-python error

I am getting this below error for when I try to install mediapipe on Kaggle. The same command works on the Google Collab fine but not on Kaggle. WARNING: Retrying (Retry(total=4, connect=None, read=...
Aakif's user avatar
  • 1
0 votes
0 answers
48 views

GraphMAE self-supervised learning: node attribute (sampled points) reconstruction works in minimal script but fails in full pipeline

I am experimenting with a GraphMAE self-supervised architecture using PyTorch + DGL. In my task, each graph node represents a CAD entity, and one node attribute stores sampled points (coordinates + ...
yxtq f's user avatar
  • 1
-1 votes
1 answer
118 views

How to download Open Images V7, images on device? [closed]

I wanted a perticular class Images ('Turban') from the Open Images, However these images are not in the Boxable Category. Due to which my follow OIDv6 code is Failing oidv6 downloader en --dataset ./...
Jivhesh Choudhari's user avatar
1 vote
3 answers
78 views

Why isn't my keras model throwing and error when different sizes are passed to the dense layer?

I am working on a dynamic time series multi-class segmentation problem with keras (tensorflow version 2.12.0), and I wanted to see what would happen when I dropped in a dense layer into the network ...
jjschuh's user avatar
  • 403
0 votes
0 answers
44 views

PyTorch XPU training loop memory leak despite explicit cleanup (gc.collect, torch.xpu.empty_cache)

I’m training a RVC which use HIFI-GAN-style speech model on Intel XPU (PyTorch 2.3, oneAPI backend). During training, my GPU/XPU memory usage keeps growing with each batch until OOM, even though I ...
i suck at programming's user avatar
1 vote
1 answer
164 views

Getting different results across different machines while training RL

While training my RL algorithm using SBX, I am getting different results across my HPC cluster and PC. However, I did find that results consistently are same within the same machine. They just diverge ...
desert_ranger's user avatar
0 votes
0 answers
24 views

Gradcam with ResNet-50 Error: The name "input_layer_1" is used 2 times in the model. All operation names should be unique

I am new to the implementation of Gradcam, and having a trouble with it. I have created a model img_shape = (256, 256, 3) base_model = ResNet50(include_top=False, ...
Arcturus's user avatar
0 votes
0 answers
75 views

KFold cross-validation in Keras: model not resetting between folds (MobileNet backbone)

I am trying to perform KFold cross-validation on a Keras model. The first fold runs exactly as expected, but from the second fold onwards the model doesn’t seem to reset. The training behaves ...
pd_prince's user avatar
0 votes
0 answers
34 views

Why does MATLAB selfAttentionLayer give different parameter counts for head/key-channel pairs with the same total key dimension?

I’m experimenting with the MathWorks example that inserts a multi-head self-attention layer into a simple CNN for the DigitDataset: Link to example layers = [ imageInputLayer([28 28 1]) ...
Hend mahmoud's user avatar
0 votes
0 answers
51 views

Layer "dense" expects 1 input(s), but it received 2 input tensors

Environment Raspberry Pi 4: 8 GB RAM Relevant Dependencies: keras 3.11.3 numpy 1.26.4 opencv-python 4.12.0.88 pillow 11.3.0 tensorflow ...
Priyanshu Jha's user avatar
0 votes
1 answer
47 views

Ram Memory leak when scripting a Sampling Trainer for a Bert Encoder and LSTM Decoder Tensorflow on GPU

I wrote the module attached below. However, I notice a constant increase of RAM until I get an out of memory error. The code runs on CPU without a problem (except the slow training time). It can ...
mashtock's user avatar
  • 400
6 votes
1 answer
660 views

Calculating the partial derivative of PyTorch model output with respect to neurons pre-activation

I am working on neuron importance for ANNs (in a classification setting). A simple baseline is the partial derivative of the model output for the correct class with respect to the given neuron pre-...
jonupp's user avatar
  • 63
-1 votes
0 answers
114 views

Huge training loss when fine-tuning DeBERTa-v3-small with HuggingFace Trainer + LoRA

I am trying to fine-tune microsoft/deberta-v3-small with HuggingFace Trainer + PEFT LoRA adapters for a binary classification task (truth vs lie transcripts). My dataset is from the MU3D database. It ...
myts999's user avatar
  • 61
1 vote
1 answer
52 views

Replacing WideResNet50 with EfficientNetV2-M in GLASS defect detection model causes Module layer2 not found in the model [closed]

I’m using the GLASS defect detection model and want to replace its default wideresnet50 backbone with efficientnetv2_m in shell/run-custom.sh.However, when I run bash run-custom.sh I get the ...
aniaf's user avatar
  • 19
2 votes
1 answer
92 views

How do I format my a tensorflow dataset for a multi output model?

I have an image dataset where each image has multiple categorical features that I want to predict. I am getting this error when trying to train: ValueError: y_true and y_pred have different structures....
Fish4203's user avatar
0 votes
1 answer
107 views

RuntimeError in torch.cat during VACE-Wan2.1 inference: mask and video tensor shape mismatch

I'm using the Wan2.1-VACE video generation model, and during inference I encountered a RuntimeError related to mismatched tensor shapes in a torch.cat operation inside the vace_latent() function. From ...
范姜伯軒's user avatar
1 vote
2 answers
76 views

MNIST multi digit prediction [closed]

I tried to train a model on mnist dataset. It's working fine for single digit prediction and I got 96.92% accuracy on test data too but I have everything but it's not working for multidigit. even ...
Dropper's user avatar
  • 19
2 votes
1 answer
103 views

Saving embeddings from encoder efficiently with fast random access

I have embeddings (about 160 Million) that I created with a BERT-based encoder model. Right now they are in a .pt format and takes about 500GB in the disk. I want 2 things: To save them in an ...
Noam's user avatar
  • 55
1 vote
1 answer
134 views

JAX lax.scan: How to iterate over layers and memory slices simultaneously without dynamic indexing in a multi-layer RNN structure?

I'm trying to implement a framework that manages RNN networks with an arbitrary number of layers (it's all part of a library I'm building based on jax/equinox), the problem is that I can't find an ...
lifera's user avatar
  • 11
1 vote
1 answer
83 views

DQN fails to learn good policy for Atari Pong

I'm trying to implement the findings from this DeepMind DQN paper (2015) from scratch in PyTorch using the Atari Pong environment. I've tested my Deep Q-Network on a simple test environment, where ...
Rohan Patel's user avatar
0 votes
0 answers
87 views

GradientTape won't calculate gradients after restoring a model from ModelCheckpoint

I'm training a CNN on Tensorflow for binary classification and executing my code in Google Colab. CNN_model = tf.keras.Sequential([ tf.keras.layers.Input(shape=(IMAGE_SIZE, IMAGE_SIZE, 3)), tf....
Nicola's user avatar
  • 11
3 votes
2 answers
50 views

How to get the KL divergence between two datasets (say ImageNet1k and FGVC-Aircraft)

I want to find out the KL divergence between 2 datasets, so for that I am extracting the features from ResNet50 for both of them, but if I calculate the same for ImageNet1k (say 20% of val set) and ...
Shashank Priyadarshi's user avatar
1 vote
1 answer
99 views

Trained Huggingface EncoderDecoderModel.generate() produces only bos-tokens

I am working on a Huggingface transformers EncoderDecoderModel consisting of a frozen BERT-Encoder (answerdotai-ModernBERT-base) and a trainable GPT2-Decoder. Due to the different architectures for ...
soosmann's user avatar
  • 119
0 votes
1 answer
92 views

i want to custmize data.yaml file for train yolo11x.pt model by ultralytics?

class_id x_center y_center width height behavior_id eg.txt file data 6 0.260313 0.739167 0.131875 0.038333 1 6 0.580313 0.821250 0.290625 0.245834 0 6 0.821562 0.775416 0.230625 0.179167 0 6 0.914062 ...
himanshu's user avatar
0 votes
1 answer
50 views

How to find MSE when using a batch loader?

I'm working on a regression task using deep learning models. While calculating the MSE, I have divided by the length of the dataset. However, ChatGPT is suggesting me to divide it by the length of the ...
mansi's user avatar
  • 13
3 votes
1 answer
117 views

Neural Network built from scratch using numpy isn't learning

I'm building a neural network from scratch using only Python and numpy, It's meant for classifying the MNIST data set, I got everything to work but the network isn't really learning, at epoch 0 it's ...
buzzbuzz20xx's user avatar
0 votes
1 answer
66 views

softmax functions always outputing garbage numbers that don't add up to one [closed]

I am creating a simple NN from scratch that can classify MNIST digits, It only has 1 hidden layer:- Loading the data: import numpy as np import matplotlib.pyplot as plt from keras.datasets import ...
buzzbuzz20xx's user avatar
0 votes
1 answer
60 views

Hyperparameter tuning using Wandb or Keras Tuner - 10 fold Cross Validation

If I am using stratified 10-folds for classification/regression tasks, where do I need to define the logic for hyperparameter tuning using Scikit or Wandb? Should it be inside the loop or outside? I ...
Ayesha Kiran's user avatar
0 votes
0 answers
101 views

RuntimeError: Trying to backward through the graph

I am using the custom Seq2SeqTrainingArguments and Seq2SeqTrainer from Huggingface and I am facing the below error. I am using WhisperSmall. How can I resolve this error? RuntimeError: Trying to ...
Turing101's user avatar
  • 377

1
2 3 4 5
545