Newest 'deep-learning' Questions

-3 votes

0 answers

31 views

LSTM+Attention stock prediction model stops improving after 1–2 epochs despite lower learning rate - why? [closed]

I'm a student building a deep learning model for stock return prediction. I'm currently facing an issue where the validation loss stops improving after only 1–2 epochs, causing the model to barely ...

이선호

1

asked yesterday

-1 votes

0 answers

30 views

Input fusion in contextual reinforcement learning [closed]

I’m currently exploring contextual reinforcement learning for a university project. I understand that in actor–critic methods like PPO and SAC, it might be possible to combine state and contextual ...

Manu Mano

1

asked Nov 13 at 14:48

-1 votes

0 answers

32 views

How to Improve Reconstruction Quality with VAE? [closed]

I am training a VAE architecture on microscopy images. Dataset of 1000 training images, 253 testing images. Images are resized to 128x128 input or 256x256 input from original resolution which is ...

MT0820

19

asked Nov 12 at 19:42

-4 votes

0 answers

38 views

Advice on Identifying Extremely Similar Objects in Real Time [closed]

I’m working on a real-time system that identifies objects, but I’m facing a challenge: new objects can look extremely similar to known ones (sometimes differences are as small as 0.1 mm), and I need ...

nour achahlaou

1

asked Nov 7 at 22:52

1 vote

0 answers

65 views

Should I use torch.inference_mode() in a prediction method even when using model.eval()? [duplicate]

I'm following the book "Deep Learning with PyTorch Step By Step" and I have a question about the predict method in the StepByStep class (from this repository: GitHub). The current ...

Matteo

93

asked Nov 4 at 12:43

2 votes

1 answer

107 views

Will tf.keras.Sequential containing multiple custom layers be correctly fully serializable and deserializable in my case?

I am implementing a U-Net variant in TensorFlow/Keras with custom layers. In one of my layers custom layers UPDoubleConv, I have a Sequential self.blocks containing a repeated pattern of UpSampling2D ...

Ahmed

105

asked Nov 3 at 12:00

2 votes

2 answers

87 views

Decoder only model AI making repetitive responses

I am making a Decoder only transformer using Pytorch and my dataset of choice is the fullEnglish dataset from kaggle Plaintext Wikipedia (full English). The problem is that my model output is ...

Kirito

13

asked Oct 29 at 14:32

2 votes

1 answer

32 views

AttributeError: 'NoneType' object has no attribute 'blocks' when running Cache-DiT example with Wan2.2 model

I’m trying to use Cache-DiT to accelerate inference for the Wan2.2 model. However, when I run the example script, python run_wan_2.2_i2v.py --steps 28 --cache I get the following error. Namespace(...

傅靖茹

51

asked Oct 27 at 9:21

-1 votes

1 answer

36 views

Pretrained ESRGAN (.pb) gives reddish or purple image — is this a preprocessing issue or model issue?

I'm trying to use a pretrained ESRGAN model that I downloaded in .pb format. The model runs without errors, but the output image has a noticeable reddish/purple tint instead of the correct colors. ...

Ahmed Almakki

1

asked Oct 20 at 15:54

0 votes

0 answers

62 views

Utilizing GPU with RNN models which takes it's output as input [torch]

I have a machine-translation model. In this model, I calculate a vector for a given sentence and I take this vector, aggregate with each generated output of RNN and put it into RNN again for ...

cuneyttyler

1,395

asked Oct 15 at 14:20

1 vote

0 answers

20 views

Why does the same YOLOv8n-pose model with different weights have significantly different inference speeds?

I’m testing YOLOv8n-pose models that share the exact same architecture, input size, hardware (GPU), framework, batch size, and precision settings. The only difference between them is the trained ...

Hạnh Nhi Đỗ

11

asked Oct 15 at 10:15

1 vote

1 answer

123 views

Torch Conv2d results in both dimensions convolved

I have input shape to a convolution (50, 1, 7617, 10). Here, 7617 is word vectors as rows, and 10 is the number of words in columns. I want to convolve column-wise and obtain (2631, 1, 7617, 1), 1 ...

cuneyttyler

1,395

asked Oct 12 at 5:34

0 votes

1 answer

73 views

Avoid overlap of bipartite network nodes in ggraph

I'm plotting a bipartite (two-mode) network using igraph and ggraph. But the nodes are overlapping a lot, even though there is still space in the graphic window. I would like to plot this using ggraph,...

mmmap

67

asked Oct 7 at 12:51

0 votes

0 answers

107 views

Kohya-SS SDXL LoRA Training Resets Steps Despite Successful State Loading

I am running SDXL LoRA training using Kohya's sd-scripts and accelerate. I have enabled --save_state and am trying to resume training, but the training steps always reset to 0, even though the log ...

Akash Chaudhari

21

asked Oct 5 at 14:01

0 votes

0 answers

81 views

Trouble configuring R-group substitution in REINVENT 4 (AstraZeneca) — validation errors for RLConfig and ScorerConfig

I’m using AstraZeneca’s REINVENT 4 (v4.6.27) to generate SMILES from a scaffold via R-group substitution, optimizing for 5-HT2A / D2 / 5-HT1A (maximize) and minimizing H1 / M1 / α1A, with DockStream ...

Reuben Udohaya

1

asked Sep 30 at 15:39

0 votes

1 answer

109 views

ValueError: Only instances of keras.Layer can be added to a Sequential model when using TensorFlow Hub KerasLayer

I’m trying to build a Keras Sequential model using a feature extractor from TensorFlow Hub, but I’m running into this error: ValueError: Only instances of `keras.Layer` can be added to a Sequential ...

user31600948

1

asked Sep 30 at 9:02

0 votes

1 answer

283 views

Getting “Sizes of tensors must match” error when using ComfyUI WanVideoWrapper (wan2.2) to generate video

I am trying to generate a video using Wan 2.2. My goal is to take a motion sequence from an input video and a single reference image, and then generate a new video where the character in the reference ...

hongxigoo

11

asked Sep 30 at 5:04

2 votes

1 answer

123 views

Keras Model throwing Error while integrating with frontend

I trained a model on Colab for my final year project EfficientNetB0. After all the layer training, I tested it and its result was excellent, but now I want to integrate the model to the frontend web ...

Narendra Patne

21

asked Sep 30 at 2:41

0 votes

1 answer

122 views

Preventing GPU memory leak due to a custom neural network layer

I am using the MixStyle methodology for domain adaptation, and it involves using a custom layer that is inserted after every encoder stage. However, it is causing VRAM to grow linearly, which causes ...

Vedant Dalimkar

3

asked Sep 28 at 15:00

3 votes

0 answers

77 views

Multimodel for image captioning with CNN and LSTM over flickr30k does not learn. How to fuse image features and word embeddings?

I'm working on an image captioning project using a simple CNN + LSTM architecture, as required by the course I'm studying. The full code is available here on GitHub (note: some parts are memory-...

Malihe Mahdavi sefat

473

asked Sep 27 at 15:34

-3 votes

1 answer

93 views

Can I visualize a neural network’s loss landscape to see if it’s stuck in a bad minimum? Any code example for this? [closed]

So, I’m trying to understand why sometimes neural networks get stuck during training. I heard people talk about ‘local minima’ and ‘saddle points,’ but I can’t really picture them. I want to actually ...

prithvisyam

1

asked Sep 25 at 5:36

0 votes

0 answers

47 views

!pip install mediapipe opencv-python error

I am getting this below error for when I try to install mediapipe on Kaggle. The same command works on the Google Collab fine but not on Kaggle. WARNING: Retrying (Retry(total=4, connect=None, read=...

Aakif

1

asked Sep 22 at 13:48

0 votes

0 answers

48 views

GraphMAE self-supervised learning: node attribute (sampled points) reconstruction works in minimal script but fails in full pipeline

I am experimenting with a GraphMAE self-supervised architecture using PyTorch + DGL. In my task, each graph node represents a CAD entity, and one node attribute stores sampled points (coordinates + ...

yxtq f

1

asked Sep 16 at 10:11

-1 votes

1 answer

118 views

How to download Open Images V7, images on device? [closed]

I wanted a perticular class Images ('Turban') from the Open Images, However these images are not in the Boxable Category. Due to which my follow OIDv6 code is Failing oidv6 downloader en --dataset ./...

Jivhesh Choudhari

1

asked Sep 16 at 9:42

1 vote

3 answers

78 views

Why isn't my keras model throwing and error when different sizes are passed to the dense layer?

I am working on a dynamic time series multi-class segmentation problem with keras (tensorflow version 2.12.0), and I wanted to see what would happen when I dropped in a dense layer into the network ...

jjschuh

403

asked Sep 9 at 18:53

0 votes

0 answers

44 views

PyTorch XPU training loop memory leak despite explicit cleanup (gc.collect, torch.xpu.empty_cache)

I’m training a RVC which use HIFI-GAN-style speech model on Intel XPU (PyTorch 2.3, oneAPI backend). During training, my GPU/XPU memory usage keeps growing with each batch until OOM, even though I ...

i suck at programming

11

asked Aug 31 at 17:17

1 vote

1 answer

164 views

Getting different results across different machines while training RL

While training my RL algorithm using SBX, I am getting different results across my HPC cluster and PC. However, I did find that results consistently are same within the same machine. They just diverge ...

desert_ranger

1,861

asked Aug 28 at 20:42

0 votes

0 answers

24 views

Gradcam with ResNet-50 Error: The name "input_layer_1" is used 2 times in the model. All operation names should be unique

I am new to the implementation of Gradcam, and having a trouble with it. I have created a model img_shape = (256, 256, 3) base_model = ResNet50(include_top=False, ...

Arcturus

1

asked Aug 28 at 6:20

0 votes

0 answers

75 views

KFold cross-validation in Keras: model not resetting between folds (MobileNet backbone)

I am trying to perform KFold cross-validation on a Keras model. The first fold runs exactly as expected, but from the second fold onwards the model doesn’t seem to reset. The training behaves ...

pd_prince

21

asked Aug 26 at 19:27

0 votes

0 answers

34 views

Why does MATLAB selfAttentionLayer give different parameter counts for head/key-channel pairs with the same total key dimension?

I’m experimenting with the MathWorks example that inserts a multi-head self-attention layer into a simple CNN for the DigitDataset: Link to example layers = [ imageInputLayer([28 28 1]) ...

Hend mahmoud

1

asked Aug 26 at 12:51

0 votes

0 answers

51 views

Layer "dense" expects 1 input(s), but it received 2 input tensors

Environment Raspberry Pi 4: 8 GB RAM Relevant Dependencies: keras 3.11.3 numpy 1.26.4 opencv-python 4.12.0.88 pillow 11.3.0 tensorflow ...

Priyanshu Jha

71

asked Aug 25 at 11:09

0 votes

1 answer

47 views

Ram Memory leak when scripting a Sampling Trainer for a Bert Encoder and LSTM Decoder Tensorflow on GPU

I wrote the module attached below. However, I notice a constant increase of RAM until I get an out of memory error. The code runs on CPU without a problem (except the slow training time). It can ...

mashtock

400

asked Aug 23 at 5:40

6 votes

1 answer

660 views

Calculating the partial derivative of PyTorch model output with respect to neurons pre-activation

I am working on neuron importance for ANNs (in a classification setting). A simple baseline is the partial derivative of the model output for the correct class with respect to the given neuron pre-...

jonupp

63

asked Aug 21 at 6:21

-1 votes

0 answers

114 views

Huge training loss when fine-tuning DeBERTa-v3-small with HuggingFace Trainer + LoRA

I am trying to fine-tune microsoft/deberta-v3-small with HuggingFace Trainer + PEFT LoRA adapters for a binary classification task (truth vs lie transcripts). My dataset is from the MU3D database. It ...

myts999

61

asked Aug 17 at 23:49

1 vote

1 answer

52 views

Replacing WideResNet50 with EfficientNetV2-M in GLASS defect detection model causes Module layer2 not found in the model [closed]

I’m using the GLASS defect detection model and want to replace its default wideresnet50 backbone with efficientnetv2_m in shell/run-custom.sh.However, when I run bash run-custom.sh I get the ...

aniaf

19

asked Aug 15 at 13:09

2 votes

1 answer

92 views

How do I format my a tensorflow dataset for a multi output model?

I have an image dataset where each image has multiple categorical features that I want to predict. I am getting this error when trying to train: ValueError: y_true and y_pred have different structures....

Fish4203

33

asked Aug 15 at 8:25

0 votes

1 answer

107 views

RuntimeError in torch.cat during VACE-Wan2.1 inference: mask and video tensor shape mismatch

I'm using the Wan2.1-VACE video generation model, and during inference I encountered a RuntimeError related to mismatched tensor shapes in a torch.cat operation inside the vace_latent() function. From ...

范姜伯軒

59

asked Aug 4 at 14:36

1 vote

2 answers

76 views

MNIST multi digit prediction [closed]

I tried to train a model on mnist dataset. It's working fine for single digit prediction and I got 96.92% accuracy on test data too but I have everything but it's not working for multidigit. even ...

Dropper

19

asked Aug 1 at 5:01

2 votes

1 answer

103 views

Saving embeddings from encoder efficiently with fast random access

I have embeddings (about 160 Million) that I created with a BERT-based encoder model. Right now they are in a .pt format and takes about 500GB in the disk. I want 2 things: To save them in an ...

Noam

55

asked Jul 30 at 9:31

1 vote

1 answer

134 views

JAX lax.scan: How to iterate over layers and memory slices simultaneously without dynamic indexing in a multi-layer RNN structure?

I'm trying to implement a framework that manages RNN networks with an arbitrary number of layers (it's all part of a library I'm building based on jax/equinox), the problem is that I can't find an ...

lifera

11

asked Jul 29 at 5:17

1 vote

1 answer

83 views

DQN fails to learn good policy for Atari Pong

I'm trying to implement the findings from this DeepMind DQN paper (2015) from scratch in PyTorch using the Atari Pong environment. I've tested my Deep Q-Network on a simple test environment, where ...

Rohan Patel

21

asked Jul 22 at 4:52

0 votes

0 answers

87 views

GradientTape won't calculate gradients after restoring a model from ModelCheckpoint

I'm training a CNN on Tensorflow for binary classification and executing my code in Google Colab. CNN_model = tf.keras.Sequential([ tf.keras.layers.Input(shape=(IMAGE_SIZE, IMAGE_SIZE, 3)), tf....

Nicola

11

asked Jul 11 at 10:00

3 votes

2 answers

50 views

How to get the KL divergence between two datasets (say ImageNet1k and FGVC-Aircraft)

I want to find out the KL divergence between 2 datasets, so for that I am extracting the features from ResNet50 for both of them, but if I calculate the same for ImageNet1k (say 20% of val set) and ...

Shashank Priyadarshi

41

asked Jul 9 at 14:01

1 vote

1 answer

99 views

Trained Huggingface EncoderDecoderModel.generate() produces only bos-tokens

I am working on a Huggingface transformers EncoderDecoderModel consisting of a frozen BERT-Encoder (answerdotai-ModernBERT-base) and a trainable GPT2-Decoder. Due to the different architectures for ...

soosmann

119

asked Jul 9 at 10:47

0 votes

1 answer

92 views

i want to custmize data.yaml file for train yolo11x.pt model by ultralytics?

class_id x_center y_center width height behavior_id eg.txt file data 6 0.260313 0.739167 0.131875 0.038333 1 6 0.580313 0.821250 0.290625 0.245834 0 6 0.821562 0.775416 0.230625 0.179167 0 6 0.914062 ...

himanshu

1

asked Jul 9 at 7:08

0 votes

1 answer

50 views

How to find MSE when using a batch loader?

I'm working on a regression task using deep learning models. While calculating the MSE, I have divided by the length of the dataset. However, ChatGPT is suggesting me to divide it by the length of the ...

mansi

13

asked Jul 6 at 13:56

3 votes

1 answer

117 views

Neural Network built from scratch using numpy isn't learning

I'm building a neural network from scratch using only Python and numpy, It's meant for classifying the MNIST data set, I got everything to work but the network isn't really learning, at epoch 0 it's ...

buzzbuzz20xx

109

asked Jul 2 at 7:37

0 votes

1 answer

66 views

softmax functions always outputing garbage numbers that don't add up to one [closed]

I am creating a simple NN from scratch that can classify MNIST digits, It only has 1 hidden layer:- Loading the data: import numpy as np import matplotlib.pyplot as plt from keras.datasets import ...

buzzbuzz20xx

109

asked Jul 1 at 9:03

0 votes

1 answer

60 views

Hyperparameter tuning using Wandb or Keras Tuner - 10 fold Cross Validation

If I am using stratified 10-folds for classification/regression tasks, where do I need to define the logic for hyperparameter tuning using Scikit or Wandb? Should it be inside the loop or outside? I ...

Ayesha Kiran

17

asked Jun 30 at 3:53

0 votes

0 answers

101 views

RuntimeError: Trying to backward through the graph

I am using the custom Seq2SeqTrainingArguments and Seq2SeqTrainer from Huggingface and I am facing the below error. I am using WhisperSmall. How can I resolve this error? RuntimeError: Trying to ...

Turing101

377

asked Jun 27 at 17:21

Collectives™ on Stack Overflow