70 questions
0
votes
0
answers
217
views
How to handle "Could not initialize NNPACK! Reason: Unsupported hardware" warning in PyTorch / Silero VAD on cloud CPU?
I’m running Silero VAD (via PyTorch + torchaudio) on a Linode cloud instance (2 dedicated CPUs, 4 GB RAM). When I process 10-minute audio chunks, I always get repeated warnings like this and it doesn'...
1
vote
1
answer
82
views
Handling audio streaming over WebSocket in FastAPI for live transcription
I am trying to achieve live transcription using openai whisper model in my app but having some issues with processing the audio to get the waveform.
@router.websocket("/stt/predict/live")
...
1
vote
0
answers
67
views
How to improve voice quality of custom tts
I know I could use custom trained tacotron model and better vocoder, but are there other ways to make the voice more clear and better quality?
Here’s the code I’m currently working with:
import torch
...
3
votes
1
answer
123
views
Lowpass filter is slower on GPU than CPU in PyTorch
I have been trying out some of the Torchaudio functionalities and I can't seem to figure out why lowpass_biquad is running slower on the GPU than on the CPU. And this is true for other effects like, ...
0
votes
0
answers
745
views
Conflicting dependencies while installing torch==1.10.0, torchaudio==0.10.0, and torchvision==0.11.0 in my Python environment
I'm having trouble installing the following dependencies in my Python environment:
torch==1.10.0+cpu
torchaudio==0.10.0
torchvision==0.11.0
pyannote-audio==0.0.1
lightning==2.3.3
numpy
scipy
pandas
...
0
votes
1
answer
789
views
How to download ffmpeg utilities into Python venv with pip or manual way for torchaudio
torchaudio requiring avutil and other binary dll files
Source : https://pytorch.org/audio/2.3.0/installation.html
However they given example only for Anaconda
I am not using Anaconda but I am using ...
0
votes
0
answers
93
views
How to avoid a nan loss (from the first iteration) and gradients being None?
I am trying to predict/ fit filter coefficients using an MLP, my target function is:
However, the system is stuck in the same loss (nan) and there is no learning or update happening.
When I remove ...
1
vote
0
answers
78
views
Error when using Torchaudio library to create a data set
I am following a YT course to work on the urban 8k data set which uses Torchaudio. The author wrote the exact same code but was able to get an output while I get this error:
RuntimeError: Couldn't ...
2
votes
0
answers
380
views
How to actually use torch._constrain_as_size with real models (for onnx conversion purposes)?
I have a model I would like to convert to onnx. The model is based on torchaudio.models.Conformer:
class ConformerSpeechRecognizer(torch.nn.Module):
def __init__(self,
kernel_size,...
0
votes
1
answer
1k
views
OSError: libtorch_cuda_cpp.so: cannot open shared object file: No such file or directory
I needed to have Python torchaudio library installed for my application which is packaged into a Docker image.
I am able to do this easily on my EC2 instance easily:
pip3 install torchaudio
python3
...
3
votes
1
answer
2k
views
The torchaudio backend is empty
I am trying to read m4a audio file using torchaudio.load() but i go this following error
torchaudio.load("1.m4a")
RuntimeError Traceback (most recent call last)
...
0
votes
1
answer
1k
views
Usage of torchaudio.transforms.MelSpectrogram for tensor residing on GPU
I want to calculate a MelSpectrogram using torchaudio on a GPU. For testing, I wrote the following code:
from typing import Optional
import torch
import torchaudio
import numpy as np
from tests....
12
votes
4
answers
17k
views
How to solve RuntimeError: Couldn't find appropriate backend to handle uri in python
I want to work with audiofiles in pytorch.
If I try running this line: metadata = torchaudio.info(SAMPLE_WAV_PATH) i get the error message RuntimeError: Couldn't find appropriate backend to handle uri ...
0
votes
1
answer
486
views
Installing Torchaudio for PyTorch 1.10.0 with CUDA 11.0
On my Ubuntu 18.04 machine I have a virtual environment that contains pytorch=1.10.0=cuda110py38hf84197b_0. My CUDA version is 11.0, which I've checked by running nvidia-smi. I would like to install ...
2
votes
0
answers
464
views
Python can't find libtorchaudio.pyd, despite the file being present in folder
I'm trying to use pyannote.audio to transcribe an audio file, however when I try to run the test program that they provided on their site, the program responds with the error:
"FileNotFoundError: ...
0
votes
1
answer
321
views
Torchaudio compatibility issue with Wav files on Ubuntu WLS2
This repo https://github.com/facebookresearch/brainmagick works fine on Ubuntu vanilla. On the exact same WLS2 configuration, the following error arises. This is using torchaudio 2.2.1.
The issue is ...
0
votes
0
answers
42
views
convert a group of images in 'n' folders to dataset (eg: Mnist), to work with CNN
I am trying to convert images generated in to a dataset.
(All I have is just png images in n folders and there is no label or meta data)
This is what I aspire to do:
I am using torch audio to convert ...
2
votes
2
answers
3k
views
torchaudio can't find FFmpeg
Windows, vscode, Python 3.11.4-64bit
import torch
import torchaudio
print(torch.__version__)
print(torchaudio.__version__)
print(torchaudio._extension._FFMPEG_INITIALIZED)
2.0.1+cu117
2.0.2+cu117
...
0
votes
1
answer
560
views
Different results of Griffin-Lim from librosa and torchaudio
I'm trying to transform the spectrogram back to the audio. First I used librosa.griffinlim and it worked well, but it was time-consuming. Therefore I am trying to use torchaudio on GPU to boost the ...
0
votes
1
answer
1k
views
How to resample from 8K to 16K with librosa or torchaudio as ffmpeg do it?
In my app,
I'm getting array of audio sample (with sample rate =8000) which was loaded with torchaudio.load
I need to use this audio array and run whisper (STT).
I want to avoid from loading the ...
-2
votes
1
answer
193
views
TypeError: cannot unpack non-iterable AudioMetaData object
https://github.com/facebookresearch/svoice/issues/94
Using dependencies:
$ pip list
Package Version
antlr4-python3-runtime 4.8
audioread 3.0.1
certifi 2023.7.22
cffi 1.16.0
charset-normalizer 3.3.0
...
-2
votes
1
answer
61
views
Why is this program using torch studio like this
import torchaudio
# get length of file in samples(得到样本中文件的长度)
info = {} # 创建列表
si, _ = torchaudio.info(str(path)) # 会返回文件信息(Get signal information of an audio file.)
info['samplerate'] = si.rate
...
6
votes
2
answers
17k
views
OSError: libtorch_cuda.so: cannot open shared object file: No such file or directory
enter image description here
I have been stuck with this problem for a while, and I would be very grateful if someone could help me resolve it. The system I am using is Ubuntu with CUDA 12.0.
As ...
0
votes
1
answer
874
views
Real time speech recognition with CTC decoder
I am trying to implement real time ASR with CTC decoder. I refer to the following torchaudio example on how to use the CTC decoder. I use pyudio to listen to the microphone the output of which is byte ...
1
vote
1
answer
795
views
FFmpeg installation not detected with diart
Here I'm using the diart library for audio transcription and the OpenAI Whisper model model.
When I run my code I get this error though
Traceback (most recent call last):
File "/home/vkyc/Desktop/...
1
vote
0
answers
382
views
MP3 resampling with torchaudio and ffmpeg
I'm using torchaudio (version 2.0.2) to resample audio files. I'm trying to match the same results as ffmpeg (version 6.0). Specifically, the commands I use are:
waveform, sr = torchaudio.load(...
0
votes
0
answers
210
views
Loading commonvoice with torchaudio not working
If I try to load commonvoice with torchaudio it returns difeferent size tensors.
when i try loading commonvoice using
train_dataset = COMMONVOICE(root='/home/mr/Downloads/cv-corpus-7.0-2021-07-21/de/',...
1
vote
3
answers
4k
views
Diart (torchaudio) on Windows x64 results in torchaudio error "ImportError: FFmpeg libraries are not found. Please install FFmpeg."
I am giving a try to a speech diarization project named diart
(based on hugging face models)
I follow the instructions using a miniconda environment which are essentially:
conda create -n diart python=...
1
vote
1
answer
1k
views
Why am I unable to load an audio file with torchaudio whenever I use a GPU on kaggle?
I am trying to fine-tune wav2vec2 model for audio recognition task using a small custom dataset on kaggle that is made up of m4a audio files.
When I ran my code earlier today without an accelerator (...
1
vote
1
answer
6k
views
Torchaudio.save() .wav file is twice bigger than the original .wav file
I'm really new to pytorch and torchaudio.
I found that the file it save is twice bigger than the original file.
But I just load a .wav file and save the audio to another .wav file immediately.
Why it ...
0
votes
1
answer
502
views
pytorch torchaudio feature extraction
I have been following the tutorial for feature extraction using pytorch audio here:
https://pytorch.org/audio/0.10.0/pipelines.html#wav2vec-2-0-hubert-representation-learning
It says the result is a ...
1
vote
0
answers
513
views
torchaudio.io.StreamReader doesn't throw error when seeking to time stamp more than the duration of audio file
I am trying to get the audio chunk of audio file between specific start time and end time
Consider a audio of duration 10 seconds. Now i need to get chunk from 4 sec to 7 sec
torchaudio.info doesn't ...
1
vote
2
answers
6k
views
Convert byte data to Pytorch tensor
I created a simple model with Pytorch to recognize bird sounds and until now I feed it .wav recordings.
I want to start doing real time recognition and my question is: can I convert bytes to Pytorch ...
0
votes
2
answers
3k
views
To support decoding 'mp3' audio files, please install 'sox'
I'm trying to work on an ASR model using transfer learning on wav2vec 2 model.
Anyway when I ever I wan't to show or modifiy an audio file I get this problem
def prepare_dataset(batch):
audio = ...
0
votes
1
answer
130
views
ValueError Getting Emission from Wav2Vec2 PyTorch Pipeline Model
When calling
model = torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H.get_model()
emission = model(data)
This is to get the emission probabilities from the model.
but I get
File "XXX\lib\site-...
0
votes
1
answer
909
views
speechbrain & CUDA out of memory
I am trying to enhance an audio file (3:16 minutes in length, available here) using Speechbrain. If I run the code below (from this tutorial), I get the error OutOfMemoryError: CUDA out of memory. ...
0
votes
1
answer
37
views
Why these two WAV-creating functions are not producing identical output?
I am using these functions (that receive a pyaudio input) to produce an audio object usable on torchaudio.
However, only "write2" produces a result that works, but not "write1".
...
0
votes
2
answers
10k
views
How do I load a bytes object WAV audio file in torchaudio?
I am trying to load a bytes-class object named "audio" to be loaded as a torchaudio object:
def convert_audio(audio, target_sr: int = 16000):
wav, sr = torchaudio.load(audio)
#(......
1
vote
2
answers
1k
views
Cannot create .exe with pyinstaller from .py with torchaudio (CPU): AttributeError: '_OpNamespace' 'torchaudio' object has no attribute 'cuda_version'
I have a .py script that uses torchaudio (without GPU) to process some sound in Windows. To distribute it, I've used pyinstaller to turn it into a .exe. You can reproduce the issue with this simple ...
0
votes
1
answer
962
views
Resampling without changing pitch and ratio
I'm doing speech recognition and denoising. In order to feed the data to my model I need to resample and make it 2 channels. although I don't know the optimized resampling rate for each sound. when I ...
0
votes
2
answers
1k
views
Slicing audio given video frames
I have audio from a video that I've loaded with PyTorch. Given a starting index and ending index corresponding to the video segment of interest, along with the video FPS and audio sampling rate, how ...
0
votes
1
answer
2k
views
Backend "sox_io" is not one of available backends: ['soundfile'] even after set up of 'soundfile' on torchaudio
I am working on some speech-recognition project, but I got error when I'be tried to load an audio.
RuntimeError: Backend "sox_io" is not one of available backends: ['soundfile'].
I've ...
1
vote
1
answer
986
views
torchaudio load for PCM file - EfficientConformer
I'm struggling with parsing audio length in PCM file.
EfficientConformer use LibriSpeechDataset and the audio file format is flac, but in my case i'm using pcm files. EfficientConformer extracts audio ...
0
votes
1
answer
584
views
Is it possible to mix two mono audio tensors of different length (number of frames) in torchaudio?
I have two byte arrays - one from mic and one from soundcard of same duration (15 seconds). They have different formats (sample rate of mic = 44100, n_frames = 1363712; sample rate of stereo = 48000, ...
5
votes
2
answers
8k
views
"RunTime Error: Failed to load audio" for mp3 file (waveform, torchaudio)
No matter how I import my audio file (through uploading it on google colab, importing it through google drive), I keep getting the same error. Could it be a path issue, and if so, how could I go about ...
0
votes
1
answer
1k
views
Unable to use TorchAudio
Good morning, for some reason I cannot get TorchAudio to be used after installing.
I've tried both:
pip3 install torchaudio
conda install torchaudio
and a few other options but, I always get the error:...
1
vote
0
answers
704
views
Broadcasting error with incompatible input/output sizes (PyTorch Wave-U-Net)
I'm trying to train a Wave-U-Net for mixing multitrack audio (8 mono stems to a stereo mixture) following the methodology of this paper, whereby:
Each input consist of 121,843 samples or 2.76 seconds ...
2
votes
2
answers
1k
views
Identifying the loudest part of an audio track and cropping (Librosa or torchaudio)
I've built a U-Net model to perform audio mixing of multitrack audio, for which I've used 20s clips of the audio tracks (converted into spectrograms) as input in training the model. However the ...
0
votes
2
answers
5k
views
unable to load torchaudio even after installing
I'm trying to use torchaudio but I'm unable to import it. I have installed it and it is also visible through the pip list.
<ipython-input-6-4cf0a64f61c0> in <module>
----> 1 import ...
0
votes
2
answers
278
views
How to filter tensor shape during creating dataset in pytorch?
I have loaded the 1 second audio files in a tensor format and most of them have the [1,22050] tensor size. But several audio files have smaller sizes such as [1,3042] and I want to get rid of them. ...