27,248 questions
-3
votes
0
answers
31
views
LSTM+Attention stock prediction model stops improving after 1–2 epochs despite lower learning rate - why? [closed]
I'm a student building a deep learning model for stock return prediction.
I'm currently facing an issue where the validation loss stops improving after only 1–2 epochs, causing the model to barely ...
-1
votes
0
answers
30
views
Input fusion in contextual reinforcement learning [closed]
I’m currently exploring contextual reinforcement learning for a university project.
I understand that in actor–critic methods like PPO and SAC, it might be possible to combine state and contextual ...
-1
votes
0
answers
32
views
How to Improve Reconstruction Quality with VAE? [closed]
I am training a VAE architecture on microscopy images. Dataset of 1000 training images, 253 testing images. Images are resized to 128x128 input or 256x256 input from original resolution which is ...
-4
votes
0
answers
38
views
Advice on Identifying Extremely Similar Objects in Real Time [closed]
I’m working on a real-time system that identifies objects, but I’m facing a challenge: new objects can look extremely similar to known ones (sometimes differences are as small as 0.1 mm), and I need ...
1
vote
0
answers
65
views
Should I use torch.inference_mode() in a prediction method even when using model.eval()? [duplicate]
I'm following the book "Deep Learning with PyTorch Step By Step" and I have a question about the predict method in the StepByStep class (from this repository: GitHub).
The current ...
2
votes
1
answer
107
views
Will tf.keras.Sequential containing multiple custom layers be correctly fully serializable and deserializable in my case?
I am implementing a U-Net variant in TensorFlow/Keras with custom layers. In one of my layers custom layers UPDoubleConv, I have a Sequential self.blocks containing a repeated pattern of UpSampling2D ...
2
votes
2
answers
87
views
Decoder only model AI making repetitive responses
I am making a Decoder only transformer using Pytorch and my dataset of choice is the fullEnglish dataset from kaggle Plaintext Wikipedia (full English).
The problem is that my model output is ...
2
votes
1
answer
32
views
AttributeError: 'NoneType' object has no attribute 'blocks' when running Cache-DiT example with Wan2.2 model
I’m trying to use
Cache-DiT
to accelerate inference for the Wan2.2 model.
However, when I run the example script,
python run_wan_2.2_i2v.py --steps 28 --cache
I get the following error.
Namespace(...
-1
votes
1
answer
36
views
Pretrained ESRGAN (.pb) gives reddish or purple image — is this a preprocessing issue or model issue?
I'm trying to use a pretrained ESRGAN model that I downloaded in .pb format.
The model runs without errors, but the output image has a noticeable reddish/purple tint instead of the correct colors.
...
0
votes
0
answers
62
views
Utilizing GPU with RNN models which takes it's output as input [torch]
I have a machine-translation model. In this model, I calculate a vector for a given sentence and I take this vector, aggregate with each generated output of RNN and put it into RNN again for ...
1
vote
0
answers
20
views
Why does the same YOLOv8n-pose model with different weights have significantly different inference speeds?
I’m testing YOLOv8n-pose models that share the exact same architecture, input size, hardware (GPU), framework, batch size, and precision settings. The only difference between them is the trained ...
1
vote
1
answer
123
views
Torch Conv2d results in both dimensions convolved
I have input shape to a convolution (50, 1, 7617, 10). Here, 7617 is word vectors as rows, and 10 is the number of words in columns. I want to convolve column-wise and obtain (2631, 1, 7617, 1), 1 ...
0
votes
1
answer
73
views
Avoid overlap of bipartite network nodes in ggraph
I'm plotting a bipartite (two-mode) network using igraph and ggraph.
But the nodes are overlapping a lot, even though there is still space in the graphic window.
I would like to plot this using ggraph,...
0
votes
0
answers
107
views
Kohya-SS SDXL LoRA Training Resets Steps Despite Successful State Loading
I am running SDXL LoRA training using Kohya's sd-scripts and accelerate. I have enabled --save_state and am trying to resume training, but the training steps always reset to 0, even though the log ...
0
votes
0
answers
81
views
Trouble configuring R-group substitution in REINVENT 4 (AstraZeneca) — validation errors for RLConfig and ScorerConfig
I’m using AstraZeneca’s REINVENT 4 (v4.6.27) to generate SMILES from a scaffold via R-group substitution, optimizing for 5-HT2A / D2 / 5-HT1A (maximize) and minimizing H1 / M1 / α1A, with DockStream ...
0
votes
1
answer
109
views
ValueError: Only instances of keras.Layer can be added to a Sequential model when using TensorFlow Hub KerasLayer
I’m trying to build a Keras Sequential model using a feature extractor from TensorFlow Hub, but I’m running into this error:
ValueError: Only instances of `keras.Layer` can be added to a Sequential ...
0
votes
1
answer
283
views
Getting “Sizes of tensors must match” error when using ComfyUI WanVideoWrapper (wan2.2) to generate video
I am trying to generate a video using Wan 2.2. My goal is to take a motion sequence from an input video and a single reference image, and then generate a new video where the character in the reference ...
2
votes
1
answer
123
views
Keras Model throwing Error while integrating with frontend
I trained a model on Colab for my final year project EfficientNetB0. After all the layer training, I tested it and its result was excellent, but now I want to integrate the model to the frontend web ...
0
votes
1
answer
122
views
Preventing GPU memory leak due to a custom neural network layer
I am using the MixStyle methodology for domain adaptation, and it involves using a custom layer that is inserted after every encoder stage. However, it is causing VRAM to grow linearly, which causes ...
3
votes
0
answers
77
views
Multimodel for image captioning with CNN and LSTM over flickr30k does not learn. How to fuse image features and word embeddings?
I'm working on an image captioning project using a simple CNN + LSTM architecture, as required by the course I'm studying. The full code is available here on GitHub (note: some parts are memory-...
-3
votes
1
answer
93
views
Can I visualize a neural network’s loss landscape to see if it’s stuck in a bad minimum? Any code example for this? [closed]
So, I’m trying to understand why sometimes neural networks get stuck during training. I heard people talk about ‘local minima’ and ‘saddle points,’ but I can’t really picture them. I want to actually ...
0
votes
0
answers
47
views
!pip install mediapipe opencv-python error
I am getting this below error for when I try to install mediapipe on Kaggle.
The same command works on the Google Collab fine but not on Kaggle.
WARNING: Retrying (Retry(total=4, connect=None, read=...
0
votes
0
answers
48
views
GraphMAE self-supervised learning: node attribute (sampled points) reconstruction works in minimal script but fails in full pipeline
I am experimenting with a GraphMAE self-supervised architecture using PyTorch + DGL.
In my task, each graph node represents a CAD entity, and one node attribute stores sampled points (coordinates + ...
-1
votes
1
answer
118
views
How to download Open Images V7, images on device? [closed]
I wanted a perticular class Images ('Turban') from the Open Images, However these images are not in the Boxable Category. Due to which my follow OIDv6 code is Failing
oidv6 downloader en --dataset ./...
1
vote
3
answers
78
views
Why isn't my keras model throwing and error when different sizes are passed to the dense layer?
I am working on a dynamic time series multi-class segmentation problem with keras (tensorflow version 2.12.0), and I wanted to see what would happen when I dropped in a dense layer into the network ...
0
votes
0
answers
44
views
PyTorch XPU training loop memory leak despite explicit cleanup (gc.collect, torch.xpu.empty_cache)
I’m training a RVC which use HIFI-GAN-style speech model on Intel XPU (PyTorch 2.3, oneAPI backend). During training, my GPU/XPU memory usage keeps growing with each batch until OOM, even though I ...
1
vote
1
answer
164
views
Getting different results across different machines while training RL
While training my RL algorithm using SBX, I am getting different results across my HPC cluster and PC. However, I did find that results consistently are same within the same machine. They just diverge ...
0
votes
0
answers
24
views
Gradcam with ResNet-50 Error: The name "input_layer_1" is used 2 times in the model. All operation names should be unique
I am new to the implementation of Gradcam, and having a trouble with it.
I have created a model
img_shape = (256, 256, 3)
base_model = ResNet50(include_top=False,
...
0
votes
0
answers
75
views
KFold cross-validation in Keras: model not resetting between folds (MobileNet backbone)
I am trying to perform KFold cross-validation on a Keras model. The first fold runs exactly as expected, but from the second fold onwards the model doesn’t seem to reset. The training behaves ...
0
votes
0
answers
34
views
Why does MATLAB selfAttentionLayer give different parameter counts for head/key-channel pairs with the same total key dimension?
I’m experimenting with the MathWorks example that inserts a multi-head self-attention layer into a simple CNN for the DigitDataset:
Link to example
layers = [
imageInputLayer([28 28 1])
...
0
votes
0
answers
51
views
Layer "dense" expects 1 input(s), but it received 2 input tensors
Environment
Raspberry Pi 4: 8 GB RAM
Relevant Dependencies:
keras 3.11.3
numpy 1.26.4
opencv-python 4.12.0.88
pillow 11.3.0
tensorflow ...
0
votes
1
answer
47
views
Ram Memory leak when scripting a Sampling Trainer for a Bert Encoder and LSTM Decoder Tensorflow on GPU
I wrote the module attached below. However, I notice a constant increase of RAM until I get an out of memory error. The code runs on CPU without a problem (except the slow training time). It can ...
6
votes
1
answer
660
views
Calculating the partial derivative of PyTorch model output with respect to neurons pre-activation
I am working on neuron importance for ANNs (in a classification setting). A simple baseline is the partial derivative of the model output for the correct class with respect to the given neuron pre-...
-1
votes
0
answers
114
views
Huge training loss when fine-tuning DeBERTa-v3-small with HuggingFace Trainer + LoRA
I am trying to fine-tune microsoft/deberta-v3-small with HuggingFace Trainer + PEFT LoRA adapters for a binary classification task (truth vs lie transcripts).
My dataset is from the MU3D database. It ...
1
vote
1
answer
52
views
Replacing WideResNet50 with EfficientNetV2-M in GLASS defect detection model causes Module layer2 not found in the model [closed]
I’m using the GLASS defect detection model and want to replace its default wideresnet50 backbone with efficientnetv2_m in shell/run-custom.sh.However, when I run
bash run-custom.sh
I get the ...
2
votes
1
answer
92
views
How do I format my a tensorflow dataset for a multi output model?
I have an image dataset where each image has multiple categorical features that I want to predict. I am getting this error when trying to train:
ValueError: y_true and y_pred have different structures....
0
votes
1
answer
107
views
RuntimeError in torch.cat during VACE-Wan2.1 inference: mask and video tensor shape mismatch
I'm using the Wan2.1-VACE video generation model, and during inference I encountered a RuntimeError related to mismatched tensor shapes in a torch.cat operation inside the vace_latent() function.
From ...
1
vote
2
answers
76
views
MNIST multi digit prediction [closed]
I tried to train a model on mnist dataset. It's working fine for single digit prediction and I got 96.92% accuracy on test data too but I have everything but it's not working for multidigit.
even ...
2
votes
1
answer
103
views
Saving embeddings from encoder efficiently with fast random access
I have embeddings (about 160 Million) that I created with a BERT-based encoder model.
Right now they are in a .pt format and takes about 500GB in the disk.
I want 2 things:
To save them in an ...
1
vote
1
answer
134
views
JAX lax.scan: How to iterate over layers and memory slices simultaneously without dynamic indexing in a multi-layer RNN structure?
I'm trying to implement a framework that manages RNN networks with an arbitrary number of layers (it's all part of a library I'm building based on jax/equinox), the problem is that I can't find an ...
1
vote
1
answer
83
views
DQN fails to learn good policy for Atari Pong
I'm trying to implement the findings from this DeepMind DQN paper (2015) from scratch in PyTorch using the Atari Pong environment.
I've tested my Deep Q-Network on a simple test environment, where ...
0
votes
0
answers
87
views
GradientTape won't calculate gradients after restoring a model from ModelCheckpoint
I'm training a CNN on Tensorflow for binary classification and executing my code in Google Colab.
CNN_model = tf.keras.Sequential([
tf.keras.layers.Input(shape=(IMAGE_SIZE, IMAGE_SIZE, 3)),
tf....
3
votes
2
answers
50
views
How to get the KL divergence between two datasets (say ImageNet1k and FGVC-Aircraft)
I want to find out the KL divergence between 2 datasets, so for that I am extracting the features from ResNet50 for both of them, but if I calculate the same for ImageNet1k (say 20% of val set) and ...
1
vote
1
answer
99
views
Trained Huggingface EncoderDecoderModel.generate() produces only bos-tokens
I am working on a Huggingface transformers EncoderDecoderModel consisting of a frozen BERT-Encoder (answerdotai-ModernBERT-base) and a trainable GPT2-Decoder. Due to the different architectures for ...
0
votes
1
answer
92
views
i want to custmize data.yaml file for train yolo11x.pt model by ultralytics?
class_id x_center y_center width height behavior_id
eg.txt file data 6 0.260313 0.739167 0.131875 0.038333 1
6 0.580313 0.821250 0.290625 0.245834 0
6 0.821562 0.775416 0.230625 0.179167 0
6 0.914062 ...
0
votes
1
answer
50
views
How to find MSE when using a batch loader?
I'm working on a regression task using deep learning models. While calculating the MSE, I have divided by the length of the dataset. However, ChatGPT is suggesting me to divide it by the length of the ...
3
votes
1
answer
117
views
Neural Network built from scratch using numpy isn't learning
I'm building a neural network from scratch using only Python and numpy, It's meant for classifying the MNIST data set, I got everything to work but the network isn't really learning, at epoch 0 it's ...
0
votes
1
answer
66
views
softmax functions always outputing garbage numbers that don't add up to one [closed]
I am creating a simple NN from scratch that can classify MNIST digits, It only has 1 hidden layer:-
Loading the data:
import numpy as np
import matplotlib.pyplot as plt
from keras.datasets import ...
0
votes
1
answer
60
views
Hyperparameter tuning using Wandb or Keras Tuner - 10 fold Cross Validation
If I am using stratified 10-folds for classification/regression tasks, where do I need to define the logic for hyperparameter tuning using Scikit or Wandb?
Should it be inside the loop or outside?
I ...
0
votes
0
answers
101
views
RuntimeError: Trying to backward through the graph
I am using the custom Seq2SeqTrainingArguments and Seq2SeqTrainer from Huggingface and I am facing the below error. I am using WhisperSmall. How can I resolve this error?
RuntimeError: Trying to ...