I want to use python & ffmpeg-python to extract the audio from a video directly into numpy array.
Currently, I first dump the audio as a wav file using ffmpeg through CLI and read it back to Python using scipy.io.wavefile
$ ffmpeg -y -i {source_file} -qscale:a 0 -ac 1 -vn -threads 1 -ar 16000 out.wav
Followed by this snippet in python
_, audio1 = wavfile.read("out.wav")
Now I want to modify the above as
out, err = (
ffmpeg
.input(in_filename)
.output(
'-', format='s16le',
acodec='pcm_s16le',
ac=1,
ar='16k',
# sample_rate='16000',
**{"qscale:a": 0}
)
# .overwrite_output()
.run(capture_stdout=True, capture_stderr=True)
)
audio2 = np.frombuffer(out, dtype=np.int16)
(Ref: https://github.com/kkroening/ffmpeg-python/blob/master/examples/transcribe.py#L23)
However, when I compare audio1 and audio2, I see that the number of samples are different as well as the values. For the same file, when I read through wavefile, the signal has values in range [-221, 212], but the second approach yields values in range [-74, 72].
I also tried to plot the signal (starting 1 sec, 16000 samples) and it seems, there is some issue with delay and amplitude.
A closer look at the starting shows that there are also some 0 values at the beginning when I use wavfile
The starting delay seems to be around 320 samples.
Finally, the number of samples in both the arrays also seems to be different:
>> print(audio1.shape, audio2.shape)
(2091648,)), ((2091008,)

format='s16le', acodec='pcm_s16le'but maybe this makes difference? OR maybe you should writeaudio2to file and read withwavfileto check ifwavfiledoesn't makes modifications. Or maybe it changes when you usefrombuffer( ..., dtype=np.int16)source_file, when give to CLIffmpegas already converted toaviwhile thesource_fileI used inside python script was the originalmp4. This difference in format is causing the issue I guess. But theavifile I use is converted from original usingffmpeg -y -i {source_file} -qscale:v 2 -threads 1 -async 1 -r 25 {avi_file}, so must contain same data. Ideally, what I want is to directly take inmp4and work with it.mp4is usingencoder: Lavf58.29.100while after conversion, the newavifile is usingsoftware : Lavf59.27.100. Does this change the PCM extraction process?Lavfis loosless or not. if it is not loosless then reconverting from one version to another could change it. You would have to test it on your own