Newest 'torchaudio' Questions

0 votes

0 answers

216 views

How to handle "Could not initialize NNPACK! Reason: Unsupported hardware" warning in PyTorch / Silero VAD on cloud CPU?

I’m running Silero VAD (via PyTorch + torchaudio) on a Linode cloud instance (2 dedicated CPUs, 4 GB RAM). When I process 10-minute audio chunks, I always get repeated warnings like this and it doesn'...

Uktamjon

11

asked Sep 15 at 14:16

1 vote

1 answer

81 views

Handling audio streaming over WebSocket in FastAPI for live transcription

I am trying to achieve live transcription using openai whisper model in my app but having some issues with processing the audio to get the waveform. @router.websocket("/stt/predict/live") ...

Imisioluwa

31

asked May 27 at 10:18

1 vote

0 answers

67 views

How to improve voice quality of custom tts

I know I could use custom trained tacotron model and better vocoder, but are there other ways to make the voice more clear and better quality? Here’s the code I’m currently working with: import torch ...

Jani Kuru

11

asked Feb 9 at 0:15

3 votes

1 answer

123 views

Lowpass filter is slower on GPU than CPU in PyTorch

I have been trying out some of the Torchaudio functionalities and I can't seem to figure out why lowpass_biquad is running slower on the GPU than on the CPU. And this is true for other effects like, ...

orglce

543

asked Feb 5 at 19:35

0 votes

0 answers

743 views

Conflicting dependencies while installing torch==1.10.0, torchaudio==0.10.0, and torchvision==0.11.0 in my Python environment

I'm having trouble installing the following dependencies in my Python environment: torch==1.10.0+cpu torchaudio==0.10.0 torchvision==0.11.0 pyannote-audio==0.0.1 lightning==2.3.3 numpy scipy pandas ...

oran ben david

466

asked Jan 3 at 21:31

0 votes

1 answer

787 views

How to download ffmpeg utilities into Python venv with pip or manual way for torchaudio

torchaudio requiring avutil and other binary dll files Source : https://pytorch.org/audio/2.3.0/installation.html However they given example only for Anaconda I am not using Anaconda but I am using ...

Furkan Gözükara

24k

asked Dec 14, 2024 at 12:57

0 votes

0 answers

93 views

How to avoid a nan loss (from the first iteration) and gradients being None?

I am trying to predict/ fit filter coefficients using an MLP, my target function is: However, the system is stuck in the same loss (nan) and there is no learning or update happening. When I remove ...

SuperKogito

2,966

asked Jun 25, 2024 at 13:40

1 vote

0 answers

78 views

Error when using Torchaudio library to create a data set

I am following a YT course to work on the urban 8k data set which uses Torchaudio. The author wrote the exact same code but was able to get an output while I get this error: RuntimeError: Couldn't ...

Hussain Bhavnagarwala

11

asked May 12, 2024 at 0:34

2 votes

0 answers

379 views

How to actually use torch._constrain_as_size with real models (for onnx conversion purposes)?

I have a model I would like to convert to onnx. The model is based on torchaudio.models.Conformer: class ConformerSpeechRecognizer(torch.nn.Module): def __init__(self, kernel_size,...

Arsenii Fomin

3,366

asked May 2, 2024 at 5:41

0 votes

1 answer

1k views

OSError: libtorch_cuda_cpp.so: cannot open shared object file: No such file or directory

I needed to have Python torchaudio library installed for my application which is packaged into a Docker image. I am able to do this easily on my EC2 instance easily: pip3 install torchaudio python3 ...

Fisher Coder

3,648

asked Apr 19, 2024 at 18:47

3 votes

1 answer

2k views

The torchaudio backend is empty

I am trying to read m4a audio file using torchaudio.load() but i go this following error torchaudio.load("1.m4a") RuntimeError Traceback (most recent call last) ...

Sanjith Kumar

51

asked Apr 6, 2024 at 4:08

0 votes

1 answer

1k views

Usage of torchaudio.transforms.MelSpectrogram for tensor residing on GPU

I want to calculate a MelSpectrogram using torchaudio on a GPU. For testing, I wrote the following code: from typing import Optional import torch import torchaudio import numpy as np from tests....

arc_lupus

4,156

asked Apr 2, 2024 at 13:36

12 votes

4 answers

17k views

How to solve RuntimeError: Couldn't find appropriate backend to handle uri in python

I want to work with audiofiles in pytorch. If I try running this line: metadata = torchaudio.info(SAMPLE_WAV_PATH) i get the error message RuntimeError: Couldn't find appropriate backend to handle uri ...

Tobias

163

asked Mar 3, 2024 at 20:30

0 votes

1 answer

486 views

Installing Torchaudio for PyTorch 1.10.0 with CUDA 11.0

On my Ubuntu 18.04 machine I have a virtual environment that contains pytorch=1.10.0=cuda110py38hf84197b_0. My CUDA version is 11.0, which I've checked by running nvidia-smi. I would like to install ...

Brian Provost

33

asked Feb 3, 2024 at 21:43

2 votes

0 answers

464 views

Python can't find libtorchaudio.pyd, despite the file being present in folder

I'm trying to use pyannote.audio to transcribe an audio file, however when I try to run the test program that they provided on their site, the program responds with the error: "FileNotFoundError: ...

Pedro Fukuda

11

asked Jan 17, 2024 at 20:40

0 votes

1 answer

321 views

Torchaudio compatibility issue with Wav files on Ubuntu WLS2

This repo https://github.com/facebookresearch/brainmagick works fine on Ubuntu vanilla. On the exact same WLS2 configuration, the following error arises. This is using torchaudio 2.2.1. The issue is ...

user1097111

662

asked Dec 12, 2023 at 19:53

0 votes

0 answers

42 views

convert a group of images in 'n' folders to dataset (eg: Mnist), to work with CNN

I am trying to convert images generated in to a dataset. (All I have is just png images in n folders and there is no label or meta data) This is what I aspire to do: I am using torch audio to convert ...

Rookie91

267

asked Dec 8, 2023 at 23:30

2 votes

2 answers

3k views

torchaudio can't find FFmpeg

Windows, vscode, Python 3.11.4-64bit import torch import torchaudio print(torch.__version__) print(torchaudio.__version__) print(torchaudio._extension._FFMPEG_INITIALIZED) 2.0.1+cu117 2.0.2+cu117 ...

KJ H

23

asked Nov 14, 2023 at 10:27

0 votes

1 answer

560 views

Different results of Griffin-Lim from librosa and torchaudio

I'm trying to transform the spectrogram back to the audio. First I used librosa.griffinlim and it worked well, but it was time-consuming. Therefore I am trying to use torchaudio on GPU to boost the ...

Mingxin Zhang

3

asked Nov 14, 2023 at 9:08

0 votes

1 answer

1k views

How to resample from 8K to 16K with librosa or torchaudio as ffmpeg do it?

In my app, I'm getting array of audio sample (with sample rate =8000) which was loaded with torchaudio.load I need to use this audio array and run whisper (STT). I want to avoid from loading the ...

user3668129

4,880

asked Nov 7, 2023 at 12:12

-2 votes

1 answer

193 views

TypeError: cannot unpack non-iterable AudioMetaData object

https://github.com/facebookresearch/svoice/issues/94 Using dependencies: $ pip list Package Version antlr4-python3-runtime 4.8 audioread 3.0.1 certifi 2023.7.22 cffi 1.16.0 charset-normalizer 3.3.0 ...

Guneshwar Singh

1

asked Oct 15, 2023 at 12:13

-2 votes

1 answer

61 views

Why is this program using torch studio like this

import torchaudio # get length of file in samples(得到样本中文件的长度） info = {} # 创建列表 si, _ = torchaudio.info(str(path)) # 会返回文件信息（Get signal information of an audio file.） info['samplerate'] = si.rate ...

eureka

3

asked Aug 30, 2023 at 15:12

6 votes

2 answers

17k views

OSError: libtorch_cuda.so: cannot open shared object file: No such file or directory

enter image description here I have been stuck with this problem for a while, and I would be very grateful if someone could help me resolve it. The system I am using is Ubuntu with CUDA 12.0. As ...

Ivan Wang

61

asked Jul 28, 2023 at 7:39

0 votes

1 answer

873 views

Real time speech recognition with CTC decoder

I am trying to implement real time ASR with CTC decoder. I refer to the following torchaudio example on how to use the CTC decoder. I use pyudio to listen to the microphone the output of which is byte ...

rumnen

11

asked Jun 14, 2023 at 0:50

1 vote

1 answer

795 views

FFmpeg installation not detected with diart

Here I'm using the diart library for audio transcription and the OpenAI Whisper model model. When I run my code I get this error though Traceback (most recent call last): File "/home/vkyc/Desktop/...

Schrödinger's Cat

95

asked Jun 8, 2023 at 8:11

1 vote

0 answers

382 views

MP3 resampling with torchaudio and ffmpeg

I'm using torchaudio (version 2.0.2) to resample audio files. I'm trying to match the same results as ffmpeg (version 6.0). Specifically, the commands I use are: waveform, sr = torchaudio.load(...

hsiaomichiu

602

asked Jun 8, 2023 at 2:32

0 votes

0 answers

210 views

Loading commonvoice with torchaudio not working

If I try to load commonvoice with torchaudio it returns difeferent size tensors. when i try loading commonvoice using train_dataset = COMMONVOICE(root='/home/mr/Downloads/cv-corpus-7.0-2021-07-21/de/',...

BR BR

1

asked May 11, 2023 at 13:01

1 vote

3 answers

4k views

Diart (torchaudio) on Windows x64 results in torchaudio error "ImportError: FFmpeg libraries are not found. Please install FFmpeg."

I am giving a try to a speech diarization project named diart (based on hugging face models) I follow the instructions using a miniconda environment which are essentially: conda create -n diart python=...

LoneWanderer

3,347

asked May 2, 2023 at 14:19

1 vote

1 answer

1k views

Why am I unable to load an audio file with torchaudio whenever I use a GPU on kaggle?

I am trying to fine-tune wav2vec2 model for audio recognition task using a small custom dataset on kaggle that is made up of m4a audio files. When I ran my code earlier today without an accelerator (...

Xanta_Kross

75

asked Apr 1, 2023 at 4:59

1 vote

1 answer

6k views

Torchaudio.save() .wav file is twice bigger than the original .wav file

I'm really new to pytorch and torchaudio. I found that the file it save is twice bigger than the original file. But I just load a .wav file and save the audio to another .wav file immediately. Why it ...

KilinWei

23

asked Mar 30, 2023 at 5:38

0 votes

1 answer

502 views

pytorch torchaudio feature extraction

I have been following the tutorial for feature extraction using pytorch audio here: https://pytorch.org/audio/0.10.0/pipelines.html#wav2vec-2-0-hubert-representation-learning It says the result is a ...

JohnJ

7,116

asked Mar 29, 2023 at 14:04

1 vote

0 answers

513 views

torchaudio.io.StreamReader doesn't throw error when seeking to time stamp more than the duration of audio file

I am trying to get the audio chunk of audio file between specific start time and end time Consider a audio of duration 10 seconds. Now i need to get chunk from 4 sec to 7 sec torchaudio.info doesn't ...

lokesh

11

asked Mar 27, 2023 at 13:45

1 vote

2 answers

6k views

Convert byte data to Pytorch tensor

I created a simple model with Pytorch to recognize bird sounds and until now I feed it .wav recordings. I want to start doing real time recognition and my question is: can I convert bytes to Pytorch ...

asabasdc

23

asked Feb 22, 2023 at 12:36

0 votes

2 answers

3k views

To support decoding 'mp3' audio files, please install 'sox'

I'm trying to work on an ASR model using transfer learning on wav2vec 2 model. Anyway when I ever I wan't to show or modifiy an audio file I get this problem def prepare_dataset(batch): audio = ...

FOXASDF

97

asked Jan 26, 2023 at 10:10

0 votes

1 answer

130 views

ValueError Getting Emission from Wav2Vec2 PyTorch Pipeline Model

When calling model = torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H.get_model() emission = model(data) This is to get the emission probabilities from the model. but I get File "XXX\lib\site-...

Victor Zheng

17

asked Jan 19, 2023 at 7:33

0 votes

1 answer

908 views

speechbrain & CUDA out of memory

I am trying to enhance an audio file (3:16 minutes in length, available here) using Speechbrain. If I run the code below (from this tutorial), I get the error OutOfMemoryError: CUDA out of memory. ...

albusdemens

6,724

asked Jan 11, 2023 at 13:23

0 votes

1 answer

37 views

Why these two WAV-creating functions are not producing identical output?

I am using these functions (that receive a pyaudio input) to produce an audio object usable on torchaudio. However, only "write2" produces a result that works, but not "write1". ...

plshelpmeout

129

asked Nov 29, 2022 at 18:28

0 votes

2 answers

10k views

How do I load a bytes object WAV audio file in torchaudio?

I am trying to load a bytes-class object named "audio" to be loaded as a torchaudio object: def convert_audio(audio, target_sr: int = 16000): wav, sr = torchaudio.load(audio) #(......

plshelpmeout

129

asked Nov 28, 2022 at 20:01

1 vote

2 answers

1k views

Cannot create .exe with pyinstaller from .py with torchaudio (CPU): AttributeError: '_OpNamespace' 'torchaudio' object has no attribute 'cuda_version'

I have a .py script that uses torchaudio (without GPU) to process some sound in Windows. To distribute it, I've used pyinstaller to turn it into a .exe. You can reproduce the issue with this simple ...

ronkov

1,653

asked Nov 15, 2022 at 19:59

0 votes

1 answer

962 views

Resampling without changing pitch and ratio

I'm doing speech recognition and denoising. In order to feed the data to my model I need to resample and make it 2 channels. although I don't know the optimized resampling rate for each sound. when I ...

Niloufar Modir

31

asked Oct 20, 2022 at 7:46

0 votes

2 answers

1k views

Slicing audio given video frames

I have audio from a video that I've loaded with PyTorch. Given a starting index and ending index corresponding to the video segment of interest, along with the video FPS and audio sampling rate, how ...

monopoly

676

asked Oct 4, 2022 at 11:56

0 votes

1 answer

2k views

Backend "sox_io" is not one of available backends: ['soundfile'] even after set up of 'soundfile' on torchaudio

I am working on some speech-recognition project, but I got error when I'be tried to load an audio. RuntimeError: Backend "sox_io" is not one of available backends: ['soundfile']. I've ...

konio011

1

asked Sep 24, 2022 at 12:47

1 vote

1 answer

986 views

torchaudio load for PCM file - EfficientConformer

I'm struggling with parsing audio length in PCM file. EfficientConformer use LibriSpeechDataset and the audio file format is flac, but in my case i'm using pcm files. EfficientConformer extracts audio ...

Alpha Code

21

asked Sep 22, 2022 at 7:21

0 votes

1 answer

584 views

Is it possible to mix two mono audio tensors of different length (number of frames) in torchaudio?

I have two byte arrays - one from mic and one from soundcard of same duration (15 seconds). They have different formats (sample rate of mic = 44100, n_frames = 1363712; sample rate of stereo = 48000, ...

Cheeter_P

71

asked Sep 14, 2022 at 14:08

5 votes

2 answers

8k views

"RunTime Error: Failed to load audio" for mp3 file (waveform, torchaudio)

No matter how I import my audio file (through uploading it on google colab, importing it through google drive), I keep getting the same error. Could it be a path issue, and if so, how could I go about ...

ihavenoidea

61

asked Aug 16, 2022 at 19:18

0 votes

1 answer

1k views

Unable to use TorchAudio

Good morning, for some reason I cannot get TorchAudio to be used after installing. I've tried both: pip3 install torchaudio conda install torchaudio and a few other options but, I always get the error:...

Novous

11

asked Aug 15, 2022 at 18:21

1 vote

0 answers

704 views

Broadcasting error with incompatible input/output sizes (PyTorch Wave-U-Net)

I'm trying to train a Wave-U-Net for mixing multitrack audio (8 mono stems to a stereo mixture) following the methodology of this paper, whereby: Each input consist of 121,843 samples or 2.76 seconds ...

Brudalaxe

191

asked Aug 3, 2022 at 11:21

2 votes

2 answers

1k views

Identifying the loudest part of an audio track and cropping (Librosa or torchaudio)

I've built a U-Net model to perform audio mixing of multitrack audio, for which I've used 20s clips of the audio tracks (converted into spectrograms) as input in training the model. However the ...

Brudalaxe

191

asked Aug 2, 2022 at 13:09

0 votes

2 answers

5k views

unable to load torchaudio even after installing

I'm trying to use torchaudio but I'm unable to import it. I have installed it and it is also visible through the pip list. <ipython-input-6-4cf0a64f61c0> in <module> ----> 1 import ...

Wally

1

asked Jul 13, 2022 at 8:17

0 votes

2 answers

278 views

How to filter tensor shape during creating dataset in pytorch?

I have loaded the 1 second audio files in a tensor format and most of them have the [1,22050] tensor size. But several audio files have smaller sizes such as [1,3042] and I want to get rid of them. ...

Saltanat Khalyk

79

asked May 8, 2022 at 10:28

Collectives™ on Stack Overflow