Skip to main content
We’ve updated our Terms of Service. A new AI Addendum clarifies how Stack Overflow utilizes AI interactions.
Filter by
Sorted by
Tagged with
0 votes
0 answers
216 views

How to handle "Could not initialize NNPACK! Reason: Unsupported hardware" warning in PyTorch / Silero VAD on cloud CPU?

I’m running Silero VAD (via PyTorch + torchaudio) on a Linode cloud instance (2 dedicated CPUs, 4 GB RAM). When I process 10-minute audio chunks, I always get repeated warnings like this and it doesn'...
Uktamjon's user avatar
1 vote
1 answer
81 views

Handling audio streaming over WebSocket in FastAPI for live transcription

I am trying to achieve live transcription using openai whisper model in my app but having some issues with processing the audio to get the waveform. @router.websocket("/stt/predict/live") ...
Imisioluwa's user avatar
1 vote
0 answers
67 views

How to improve voice quality of custom tts

I know I could use custom trained tacotron model and better vocoder, but are there other ways to make the voice more clear and better quality? Here’s the code I’m currently working with: import torch ...
Jani Kuru's user avatar
3 votes
1 answer
123 views

Lowpass filter is slower on GPU than CPU in PyTorch

I have been trying out some of the Torchaudio functionalities and I can't seem to figure out why lowpass_biquad is running slower on the GPU than on the CPU. And this is true for other effects like, ...
orglce's user avatar
  • 543
0 votes
0 answers
743 views

Conflicting dependencies while installing torch==1.10.0, torchaudio==0.10.0, and torchvision==0.11.0 in my Python environment

I'm having trouble installing the following dependencies in my Python environment: torch==1.10.0+cpu torchaudio==0.10.0 torchvision==0.11.0 pyannote-audio==0.0.1 lightning==2.3.3 numpy scipy pandas ...
oran ben david's user avatar
0 votes
1 answer
787 views

How to download ffmpeg utilities into Python venv with pip or manual way for torchaudio

torchaudio requiring avutil and other binary dll files Source : https://pytorch.org/audio/2.3.0/installation.html However they given example only for Anaconda I am not using Anaconda but I am using ...
Furkan Gözükara's user avatar
0 votes
0 answers
93 views

How to avoid a nan loss (from the first iteration) and gradients being None?

I am trying to predict/ fit filter coefficients using an MLP, my target function is: However, the system is stuck in the same loss (nan) and there is no learning or update happening. When I remove ...
SuperKogito's user avatar
  • 2,966
1 vote
0 answers
78 views

Error when using Torchaudio library to create a data set

I am following a YT course to work on the urban 8k data set which uses Torchaudio. The author wrote the exact same code but was able to get an output while I get this error: RuntimeError: Couldn't ...
Hussain Bhavnagarwala's user avatar
2 votes
0 answers
379 views

How to actually use torch._constrain_as_size with real models (for onnx conversion purposes)?

I have a model I would like to convert to onnx. The model is based on torchaudio.models.Conformer: class ConformerSpeechRecognizer(torch.nn.Module): def __init__(self, kernel_size,...
Arsenii Fomin's user avatar
0 votes
1 answer
1k views

OSError: libtorch_cuda_cpp.so: cannot open shared object file: No such file or directory

I needed to have Python torchaudio library installed for my application which is packaged into a Docker image. I am able to do this easily on my EC2 instance easily: pip3 install torchaudio python3 ...
Fisher Coder's user avatar
  • 3,648
3 votes
1 answer
2k views

The torchaudio backend is empty

I am trying to read m4a audio file using torchaudio.load() but i go this following error torchaudio.load("1.m4a") RuntimeError Traceback (most recent call last) ...
Sanjith Kumar's user avatar
0 votes
1 answer
1k views

Usage of torchaudio.transforms.MelSpectrogram for tensor residing on GPU

I want to calculate a MelSpectrogram using torchaudio on a GPU. For testing, I wrote the following code: from typing import Optional import torch import torchaudio import numpy as np from tests....
arc_lupus's user avatar
  • 4,156
12 votes
4 answers
17k views

How to solve RuntimeError: Couldn't find appropriate backend to handle uri in python

I want to work with audiofiles in pytorch. If I try running this line: metadata = torchaudio.info(SAMPLE_WAV_PATH) i get the error message RuntimeError: Couldn't find appropriate backend to handle uri ...
Tobias's user avatar
  • 163
0 votes
1 answer
486 views

Installing Torchaudio for PyTorch 1.10.0 with CUDA 11.0

On my Ubuntu 18.04 machine I have a virtual environment that contains pytorch=1.10.0=cuda110py38hf84197b_0. My CUDA version is 11.0, which I've checked by running nvidia-smi. I would like to install ...
Brian Provost's user avatar
2 votes
0 answers
464 views

Python can't find libtorchaudio.pyd, despite the file being present in folder

I'm trying to use pyannote.audio to transcribe an audio file, however when I try to run the test program that they provided on their site, the program responds with the error: "FileNotFoundError: ...
Pedro Fukuda's user avatar
0 votes
1 answer
321 views

Torchaudio compatibility issue with Wav files on Ubuntu WLS2

This repo https://github.com/facebookresearch/brainmagick works fine on Ubuntu vanilla. On the exact same WLS2 configuration, the following error arises. This is using torchaudio 2.2.1. The issue is ...
user1097111's user avatar
0 votes
0 answers
42 views

convert a group of images in 'n' folders to dataset (eg: Mnist), to work with CNN

I am trying to convert images generated in to a dataset. (All I have is just png images in n folders and there is no label or meta data) This is what I aspire to do: I am using torch audio to convert ...
Rookie91's user avatar
  • 267
2 votes
2 answers
3k views

torchaudio can't find FFmpeg

Windows, vscode, Python 3.11.4-64bit import torch import torchaudio print(torch.__version__) print(torchaudio.__version__) print(torchaudio._extension._FFMPEG_INITIALIZED) 2.0.1+cu117 2.0.2+cu117 ...
KJ H's user avatar
  • 23
0 votes
1 answer
560 views

Different results of Griffin-Lim from librosa and torchaudio

I'm trying to transform the spectrogram back to the audio. First I used librosa.griffinlim and it worked well, but it was time-consuming. Therefore I am trying to use torchaudio on GPU to boost the ...
Mingxin Zhang's user avatar
0 votes
1 answer
1k views

How to resample from 8K to 16K with librosa or torchaudio as ffmpeg do it?

In my app, I'm getting array of audio sample (with sample rate =8000) which was loaded with torchaudio.load I need to use this audio array and run whisper (STT). I want to avoid from loading the ...
user3668129's user avatar
  • 4,880
-2 votes
1 answer
193 views

TypeError: cannot unpack non-iterable AudioMetaData object

https://github.com/facebookresearch/svoice/issues/94 Using dependencies: $ pip list Package Version antlr4-python3-runtime 4.8 audioread 3.0.1 certifi 2023.7.22 cffi 1.16.0 charset-normalizer 3.3.0 ...
Guneshwar Singh's user avatar
-2 votes
1 answer
61 views

Why is this program using torch studio like this

import torchaudio # get length of file in samples(得到样本中文件的长度) info = {} # 创建列表 si, _ = torchaudio.info(str(path)) # 会返回文件信息(Get signal information of an audio file.) info['samplerate'] = si.rate ...
eureka's user avatar
  • 3
6 votes
2 answers
17k views

OSError: libtorch_cuda.so: cannot open shared object file: No such file or directory

enter image description here I have been stuck with this problem for a while, and I would be very grateful if someone could help me resolve it. The system I am using is Ubuntu with CUDA 12.0. As ...
Ivan Wang's user avatar
0 votes
1 answer
873 views

Real time speech recognition with CTC decoder

I am trying to implement real time ASR with CTC decoder. I refer to the following torchaudio example on how to use the CTC decoder. I use pyudio to listen to the microphone the output of which is byte ...
rumnen's user avatar
  • 11
1 vote
1 answer
795 views

FFmpeg installation not detected with diart

Here I'm using the diart library for audio transcription and the OpenAI Whisper model model. When I run my code I get this error though Traceback (most recent call last): File "/home/vkyc/Desktop/...
Schrödinger's Cat's user avatar
1 vote
0 answers
382 views

MP3 resampling with torchaudio and ffmpeg

I'm using torchaudio (version 2.0.2) to resample audio files. I'm trying to match the same results as ffmpeg (version 6.0). Specifically, the commands I use are: waveform, sr = torchaudio.load(...
hsiaomichiu's user avatar
0 votes
0 answers
210 views

Loading commonvoice with torchaudio not working

If I try to load commonvoice with torchaudio it returns difeferent size tensors. when i try loading commonvoice using train_dataset = COMMONVOICE(root='/home/mr/Downloads/cv-corpus-7.0-2021-07-21/de/',...
BR BR's user avatar
  • 1
1 vote
3 answers
4k views

Diart (torchaudio) on Windows x64 results in torchaudio error "ImportError: FFmpeg libraries are not found. Please install FFmpeg."

I am giving a try to a speech diarization project named diart (based on hugging face models) I follow the instructions using a miniconda environment which are essentially: conda create -n diart python=...
LoneWanderer's user avatar
  • 3,347
1 vote
1 answer
1k views

Why am I unable to load an audio file with torchaudio whenever I use a GPU on kaggle?

I am trying to fine-tune wav2vec2 model for audio recognition task using a small custom dataset on kaggle that is made up of m4a audio files. When I ran my code earlier today without an accelerator (...
Xanta_Kross's user avatar
1 vote
1 answer
6k views

Torchaudio.save() .wav file is twice bigger than the original .wav file

I'm really new to pytorch and torchaudio. I found that the file it save is twice bigger than the original file. But I just load a .wav file and save the audio to another .wav file immediately. Why it ...
KilinWei's user avatar
0 votes
1 answer
502 views

pytorch torchaudio feature extraction

I have been following the tutorial for feature extraction using pytorch audio here: https://pytorch.org/audio/0.10.0/pipelines.html#wav2vec-2-0-hubert-representation-learning It says the result is a ...
JohnJ's user avatar
  • 7,116
1 vote
0 answers
513 views

torchaudio.io.StreamReader doesn't throw error when seeking to time stamp more than the duration of audio file

I am trying to get the audio chunk of audio file between specific start time and end time Consider a audio of duration 10 seconds. Now i need to get chunk from 4 sec to 7 sec torchaudio.info doesn't ...
lokesh's user avatar
  • 11
1 vote
2 answers
6k views

Convert byte data to Pytorch tensor

I created a simple model with Pytorch to recognize bird sounds and until now I feed it .wav recordings. I want to start doing real time recognition and my question is: can I convert bytes to Pytorch ...
asabasdc's user avatar
0 votes
2 answers
3k views

To support decoding 'mp3' audio files, please install 'sox'

I'm trying to work on an ASR model using transfer learning on wav2vec 2 model. Anyway when I ever I wan't to show or modifiy an audio file I get this problem def prepare_dataset(batch): audio = ...
FOXASDF's user avatar
  • 97
0 votes
1 answer
130 views

ValueError Getting Emission from Wav2Vec2 PyTorch Pipeline Model

When calling model = torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H.get_model() emission = model(data) This is to get the emission probabilities from the model. but I get File "XXX\lib\site-...
Victor Zheng's user avatar
0 votes
1 answer
908 views

speechbrain & CUDA out of memory

I am trying to enhance an audio file (3:16 minutes in length, available here) using Speechbrain. If I run the code below (from this tutorial), I get the error OutOfMemoryError: CUDA out of memory. ...
albusdemens's user avatar
  • 6,724
0 votes
1 answer
37 views

Why these two WAV-creating functions are not producing identical output?

I am using these functions (that receive a pyaudio input) to produce an audio object usable on torchaudio. However, only "write2" produces a result that works, but not "write1". ...
plshelpmeout's user avatar
0 votes
2 answers
10k views

How do I load a bytes object WAV audio file in torchaudio?

I am trying to load a bytes-class object named "audio" to be loaded as a torchaudio object: def convert_audio(audio, target_sr: int = 16000): wav, sr = torchaudio.load(audio) #(......
plshelpmeout's user avatar
1 vote
2 answers
1k views

Cannot create .exe with pyinstaller from .py with torchaudio (CPU): AttributeError: '_OpNamespace' 'torchaudio' object has no attribute 'cuda_version'

I have a .py script that uses torchaudio (without GPU) to process some sound in Windows. To distribute it, I've used pyinstaller to turn it into a .exe. You can reproduce the issue with this simple ...
ronkov's user avatar
  • 1,653
0 votes
1 answer
962 views

Resampling without changing pitch and ratio

I'm doing speech recognition and denoising. In order to feed the data to my model I need to resample and make it 2 channels. although I don't know the optimized resampling rate for each sound. when I ...
Niloufar Modir's user avatar
0 votes
2 answers
1k views

Slicing audio given video frames

I have audio from a video that I've loaded with PyTorch. Given a starting index and ending index corresponding to the video segment of interest, along with the video FPS and audio sampling rate, how ...
monopoly's user avatar
  • 676
0 votes
1 answer
2k views

Backend "sox_io" is not one of available backends: ['soundfile'] even after set up of 'soundfile' on torchaudio

I am working on some speech-recognition project, but I got error when I'be tried to load an audio. RuntimeError: Backend "sox_io" is not one of available backends: ['soundfile']. I've ...
konio011's user avatar
1 vote
1 answer
986 views

torchaudio load for PCM file - EfficientConformer

I'm struggling with parsing audio length in PCM file. EfficientConformer use LibriSpeechDataset and the audio file format is flac, but in my case i'm using pcm files. EfficientConformer extracts audio ...
Alpha Code's user avatar
0 votes
1 answer
584 views

Is it possible to mix two mono audio tensors of different length (number of frames) in torchaudio?

I have two byte arrays - one from mic and one from soundcard of same duration (15 seconds). They have different formats (sample rate of mic = 44100, n_frames = 1363712; sample rate of stereo = 48000, ...
Cheeter_P's user avatar
5 votes
2 answers
8k views

"RunTime Error: Failed to load audio" for mp3 file (waveform, torchaudio)

No matter how I import my audio file (through uploading it on google colab, importing it through google drive), I keep getting the same error. Could it be a path issue, and if so, how could I go about ...
ihavenoidea's user avatar
0 votes
1 answer
1k views

Unable to use TorchAudio

Good morning, for some reason I cannot get TorchAudio to be used after installing. I've tried both: pip3 install torchaudio conda install torchaudio and a few other options but, I always get the error:...
Novous's user avatar
  • 11
1 vote
0 answers
704 views

Broadcasting error with incompatible input/output sizes (PyTorch Wave-U-Net)

I'm trying to train a Wave-U-Net for mixing multitrack audio (8 mono stems to a stereo mixture) following the methodology of this paper, whereby: Each input consist of 121,843 samples or 2.76 seconds ...
Brudalaxe's user avatar
  • 191
2 votes
2 answers
1k views

Identifying the loudest part of an audio track and cropping (Librosa or torchaudio)

I've built a U-Net model to perform audio mixing of multitrack audio, for which I've used 20s clips of the audio tracks (converted into spectrograms) as input in training the model. However the ...
Brudalaxe's user avatar
  • 191
0 votes
2 answers
5k views

unable to load torchaudio even after installing

I'm trying to use torchaudio but I'm unable to import it. I have installed it and it is also visible through the pip list. <ipython-input-6-4cf0a64f61c0> in <module> ----> 1 import ...
Wally's user avatar
  • 1
0 votes
2 answers
278 views

How to filter tensor shape during creating dataset in pytorch?

I have loaded the 1 second audio files in a tensor format and most of them have the [1,22050] tensor size. But several audio files have smaller sizes such as [1,3042] and I want to get rid of them. ...
Saltanat Khalyk's user avatar