55,676 questions
-3
votes
0
answers
20
views
Inaccurate Embedding Scores for Word Similarity Comparisons [closed]
I'm trying to find models that perform well at modelling word similarities. Naturally, I navigated to embedding models.
With gemini-embedding-001, I'm seeing wildly inaccurate scores. I'd like to know ...
-1
votes
0
answers
30
views
Input fusion in contextual reinforcement learning [closed]
I’m currently exploring contextual reinforcement learning for a university project.
I understand that in actor–critic methods like PPO and SAC, it might be possible to combine state and contextual ...
-1
votes
0
answers
32
views
How to Improve Reconstruction Quality with VAE? [closed]
I am training a VAE architecture on microscopy images. Dataset of 1000 training images, 253 testing images. Images are resized to 128x128 input or 256x256 input from original resolution which is ...
Tooling
0
votes
1
replies
56
views
Is it possible to train and run a small AI model on a Raspberry Pi 5 to solve text-based CAPTCHAs?
I’m trying to understand whether it’s actually feasible to train and run a small AI model on a Raspberry Pi 5 (16 GB RAM) that can solve simple text-based CAPTCHAs, the kind that contain a few letters ...
-1
votes
0
answers
18
views
SB3 PPO agent isn't aware of environment
I'm trying to build a PPO with stable baselines 3 and a custom cnn that will play Mario, but my agent jumps right into a hole even after training for a couple hours. It acts like it doesn't see ...
-2
votes
0
answers
49
views
Cannot determine the difference between Testing and Training R^2 Values [closed]
Running into an issue where I have a seriously off R^2 value for my testing data (-3.6) when my training data looks decent (0.8ish). The data could have overfitting, but the number is so off, it must ...
-3
votes
0
answers
41
views
Improving TF Binding Prediction Model - stuck around ROC-AUC 0.76 [closed]
I'm working on a machine learning model to predict the binding of the transcription factor REST to candidate DNA binding sites for my final project in a bio-informatics course. I've built a feature-...
0
votes
0
answers
26
views
Where is EXECUTORCH_LIBRARY defined in ExecuTorch v1.0?
I’m trying to register a custom operator for ExecuTorch (v1.0, built from the PyTorch 2.5 source tree).
My goal is to create a shared library that defines a few quantum operators and runs them from a ....
-2
votes
0
answers
40
views
How to monitor data quality drift over time in machine learning pipelines? [closed]
I’m building a machine learning pipeline that processes incoming data daily.
Over time, I’ve noticed the model performance drops even though the code and training logic haven’t changed.
I suspect data ...
1
vote
0
answers
60
views
How to download the output folder on Kaggle?
I wrote a notebook on Kaggle and imported a dataset.
The main content of the notebook is as follows:
%%bash
pip install xxx # Install dependencies
if [ ! -d "/kaggle/working/latex-ocr-...
-1
votes
1
answer
42
views
Getting unrecognized arguments: --federated-token in creating pipeline in microsoft/MLOpsPython
I am using this repo to create a mlops pipeline in Azure Devops.
When i tried to run the CI pipeline, I am getting the unrecognized Federal Token. I asked chatgpt, it says to update the cli version. I ...
1
vote
2
answers
147
views
After encoding my categorical columns in a pandas dataframe, I was left with too many columns. How can I drop some?
I am using Python with a pandas dataframe, it is a CSV of Steam games, and I have the categorical columns of publishers, developers, categories, genres, and tags, but categories, genres, and tags are ...
-1
votes
1
answer
63
views
Getting an error while installing reasoning-gym library
I am getting this error while downloading the Reasoning Gym library. How to resolve it?
Building wheels for collected packages: pycosat
Building wheel for pycosat (pyproject.toml) ... error
error: ...
Advice
1
vote
2
replies
120
views
Machine Learning Project using Multidimensional Array Input/Outputs
I am struggling to get my ML model to accept the input and outputs that I need.
My aim is to have it accept this as the input:
input_x = [
((4.11, 8.58, -2.2), (-1.27, -8.76, 2.23)),
((0.43, -...
0
votes
1
answer
25
views
How can I get torch.set_grad_enabled(True) to work in ComfyUI?
I just spent hours figuring out that the following code fails when included in a ComfyUI custom node, but works perfectly fine outside (using the same Python venv). I finally found out that someone ...
Tooling
0
votes
0
replies
88
views
Machine learning to manage Excel files
I would like to train a model to understand if an Excel file has the expected structure. I can put a list of right files in a folder and a list of wrong ones in another. Any help and suggestion are ...
1
vote
1
answer
51
views
How to handle unstable best_iteration in LightGBM when using Optuna for hyperparameter optimization?
I'm using Optuna to optimize LightGBM hyperparameters, and I'm running into an issue with the variability of best_iteration across different random seeds.
Current Setup
I train multiple models with ...
Advice
2
votes
0
replies
75
views
How should I balance DSA, ML fundamentals, PyTorch implementation, and Kaggle practice for ML Engineer interviews?
I’m a Computer Science graduate preparing for ML/AI Engineer roles.
I’m facing a dilemma about what to focus on, how much to allocate time to each area, and what exact roadmap to follow to prepare ...
0
votes
0
answers
23
views
Unable to install tensorflow in mac [duplicate]
ERROR: Could not find a version that satisfies the requirement tensorflow (from versions: none)
ERROR: No matching distribution found for tensorflow
0
votes
0
answers
40
views
How to force drain3 to keep log event defining parameter as a static?
I am trying to parse log lines for log anomaly detection, but two log lines are too similar for the parser to keep them apart:
[Something] VM Started
[Something] VM Paused
it parses it to VM <*>...
0
votes
0
answers
70
views
Torchvision save segmentation masks to png
There is a tutorial i try to follow https://docs.pytorch.org/tutorials/intermediate/torchvision_tutorial.html
working with .png files as segmentation masks.
The png files can be found here:
https://...
3
votes
1
answer
103
views
add a condition on time in scipy odeint
I am having some troubles finding a simple solution to what seems to be a simple problem.
I want to compute the integration a simple function. And I want to modify this function if it reaches a ...
2
votes
1
answer
127
views
ERROR: No matching distribution found for tensorflow==2.12
I use macOS. I have to use LibRecommender in my code.
Python Version: 3.8.13
According to this link, the 2.10 is a suitable tensorflow version.
This is what's in my requirements.txt file I install ...
0
votes
2
answers
105
views
Why is there a duplicate index when using sort_index() in pandas?
I am doing target mean mapping based on an external statistical table, where org_ is the external data and merged_data is the set of training data and test data. After processing, the features of ...
0
votes
1
answer
77
views
Cannot calculate confusion matrix utilizing supervision from roboflow for Yolov8 model
I am trying to calculate the confusion matrix for my yolov8 (or yolov11) model utilizing supervision from roboflow. I found some instructions but they do not seem to be crystal clear. For example ...
3
votes
1
answer
69
views
Matching PyTorch and ONNX outputs layer-wise for debugging inference drift
I want to debug layer-by-layer to see where the ONNX model starts deviating from the PyTorch model outputs.
I can extract intermediate outputs in PyTorch using forward hooks, like:
def get_activation(...
0
votes
0
answers
245
views
Installation error while installing GroundingDino
I am trying to install the GroundingDino as instructed in the README file of their official GitHub repo, but I am facing the error below:
Obtaining file:///home/kgupta/workspace/Synthetic_Data_gen/...
0
votes
1
answer
118
views
Why does a LSTM pytorch model yield constant values?
I am training a LSTM model with data from yfinance. The process is really standard. I get the data with yf.download(ticker=ticker) where ticker='AAPL and do df.rolling(30, min_periods=1) to smooth the ...
0
votes
0
answers
51
views
Selective Inference on Ordinal Clustering
I've been using an ordered stereotype (OSM) approach to ordinal clustering with the R library 'clustord'
clustord is very well-documented with step-by step vignette. Therefore, to execute row ...
-3
votes
1
answer
93
views
Can I visualize a neural network’s loss landscape to see if it’s stuck in a bad minimum? Any code example for this? [closed]
So, I’m trying to understand why sometimes neural networks get stuck during training. I heard people talk about ‘local minima’ and ‘saddle points,’ but I can’t really picture them. I want to actually ...
0
votes
0
answers
58
views
Batch processing with Ultralytics YOLO does not seem to work for coreml, but is working fine for .pt
I am trying to do batch inference with YOLO11. I am working with MacBook and I am running into this issue
from ultralytics import YOLO
import numpy as np
# Load YOLO model
model = YOLO("yolo11s....
2
votes
1
answer
89
views
How to integrate a lightweight image-to-text model into a React Native app?
I am trying to integrate an image-to-text model into a React Native mobile app.
My requirements:
The model should support image + text input → text output.
It should be lightweight enough to run on ...
0
votes
0
answers
79
views
How to load a model while ignoring unbuilt head layers? (`expected 2 variables, received 0`)
I’m loading a costume ViT backbone saved via MLflow’s TensorFlow flavor (Keras 3). The artifact includes backbone parts I want (patch_embed, encoder) a couple of layers in the encoder were saved in a ...
0
votes
0
answers
61
views
I am having latency in box rendering in Yolo object detection
I'm developing a Flutter app for real-time pharmaceutical box detection using TensorFlow Lite. The detection works well, but I'm experiencing significant latency in bounding box rendering when moving ...
3
votes
1
answer
119
views
TensorRT PWC-Net Causing 2.4km Trajectory Error in iSLAM - Original PyTorch Works Fine
Problem Statement
My iSLAM system works correctly with the original PyTorch PWC-Net but produces catastrophic trajectory errors (2.4km ATE RMSE) when I replace it with a TensorRT-converted version. ...
1
vote
0
answers
66
views
How do I convert TensorFlow SavedModel into TensorFlow.js format?
I’m trying to convert my TensorFlow SavedModel into a TensorFlow.js format using tensorflowjs_converter.
tensorflowjs_converter --input_format=tf_saved_model --output_format=tfjs_graph_model --...
0
votes
0
answers
148
views
TensorRT DLA Engine Build Fails for PWC-Net on Jetson NX - Missing Layer Support?
I'm converting a PWC-Net optical flow model to run on Jetson NX DLA using the iSLAM framework, but the TensorRT engine build fails during DLA optimization.
Environment
Hardware: NVIDIA Jetson NX
...
1
vote
3
answers
78
views
Why isn't my keras model throwing and error when different sizes are passed to the dense layer?
I am working on a dynamic time series multi-class segmentation problem with keras (tensorflow version 2.12.0), and I wanted to see what would happen when I dropped in a dense layer into the network ...
0
votes
0
answers
58
views
CUDA error 700: an illegal memory access was encountered
I encounter error:
Application terminated with error: ??+0 (0x709D9F003D8A)
??+0 (0x709D9E884BA4)
??+0 (0x709D9E9F388C)
??+0 (0x709D9FF2FCF5)
??+0 (0x709D9FF31448)
??+0 (0x709D9EB84E21)
??+0 (...
0
votes
3
answers
200
views
n_jobs>=2 breaks reproducibility
I am facing a problem in maintaining the reproducibility in the ML project. I believe the core snippet of my issue is
clf = Clf(random_state=cfg.seed)
# instantiate the K-fold cross-validation ...
0
votes
0
answers
64
views
Why is my plot of the cost function like this and not like a bowl?
My code:
def calc_cost_function(w, b, data):
m = len(data)
cost = 0
for i in range(m):
x = data.iloc[i].X
y = data.iloc[i].Y
cost += ((x * w + b) - y)**2
return ...
0
votes
0
answers
44
views
PyTorch XPU training loop memory leak despite explicit cleanup (gc.collect, torch.xpu.empty_cache)
I’m training a RVC which use HIFI-GAN-style speech model on Intel XPU (PyTorch 2.3, oneAPI backend). During training, my GPU/XPU memory usage keeps growing with each batch until OOM, even though I ...
0
votes
0
answers
76
views
Flask ML App Stuck on "Loading" Status Despite Successful Model Training
I'm deploying a Flask ML application with book recommendations to Render, but I'm experiencing a persistent issue where my health endpoint always returns "model_loaded": false, "status&...
0
votes
1
answer
103
views
How to load dataset from huggingface to google colab?
I am trying to load a training dataset in my Google Colab notebook but keep getting an error.
Here is the code snippet which returns the error:
from datasets import load_dataset
ds = load_dataset(&...
0
votes
1
answer
88
views
How do I create a Pytorch Dataset from multiple files where each file has multiple batches
How do I create an dataset that reads in data from multiple files, but where each file has lots of rows or batches.
For example, I have a partitioned parquet dataset (created with pandas.to_parquet), ...
6
votes
1
answer
660
views
Calculating the partial derivative of PyTorch model output with respect to neurons pre-activation
I am working on neuron importance for ANNs (in a classification setting). A simple baseline is the partial derivative of the model output for the correct class with respect to the given neuron pre-...
3
votes
1
answer
84
views
Freeze, then unfreeze gradients of a trained parameter in PyTorch does not work
Let's say I have a parameter which is a p-shaped vector and I wish to train it in PyTorch such that: for some iterations, only the first k <= p elements of this vector were trained whereas the rest ...
1
vote
0
answers
106
views
Gradient Descent blowing up in linear regression
I am coding a linear regression code in python,I used the formulas I learnt and checked them up, and also tried normalising the the dataset what happened then is the values of weight and bias changed ...
-1
votes
0
answers
114
views
Huge training loss when fine-tuning DeBERTa-v3-small with HuggingFace Trainer + LoRA
I am trying to fine-tune microsoft/deberta-v3-small with HuggingFace Trainer + PEFT LoRA adapters for a binary classification task (truth vs lie transcripts).
My dataset is from the MU3D database. It ...
1
vote
1
answer
52
views
Replacing WideResNet50 with EfficientNetV2-M in GLASS defect detection model causes Module layer2 not found in the model [closed]
I’m using the GLASS defect detection model and want to replace its default wideresnet50 backbone with efficientnetv2_m in shell/run-custom.sh.However, when I run
bash run-custom.sh
I get the ...