1

I want to use python & ffmpeg-python to extract the audio from a video directly into numpy array.

Currently, I first dump the audio as a wav file using ffmpeg through CLI and read it back to Python using scipy.io.wavefile

$ ffmpeg -y -i {source_file} -qscale:a 0 -ac 1 -vn -threads 1 -ar 16000 out.wav

Followed by this snippet in python

_, audio1 = wavfile.read("out.wav")

Now I want to modify the above as

out, err = (
    ffmpeg
        .input(in_filename)
        .output(
            '-', format='s16le', 
            acodec='pcm_s16le', 
            ac=1, 
            ar='16k', 
            # sample_rate='16000',
            **{"qscale:a": 0}
        )
        # .overwrite_output()
        .run(capture_stdout=True, capture_stderr=True)
)

audio2 = np.frombuffer(out, dtype=np.int16)

(Ref: https://github.com/kkroening/ffmpeg-python/blob/master/examples/transcribe.py#L23)

However, when I compare audio1 and audio2, I see that the number of samples are different as well as the values. For the same file, when I read through wavefile, the signal has values in range [-221, 212], but the second approach yields values in range [-74, 72].

I also tried to plot the signal (starting 1 sec, 16000 samples) and it seems, there is some issue with delay and amplitude.figure1

A closer look at the starting shows that there are also some 0 values at the beginning when I use wavfile

enter image description here

The starting delay seems to be around 320 samples.

Finally, the number of samples in both the arrays also seems to be different:

>> print(audio1.shape, audio2.shape)
(2091648,)), ((2091008,)
4
  • in python you have format='s16le', acodec='pcm_s16le' but maybe this makes difference? OR maybe you should write audio2 to file and read with wavfile to check if wavfile doesn't makes modifications. Or maybe it changes when you use frombuffer( ..., dtype=np.int16) Commented Dec 9, 2023 at 5:19
  • Did some more digging and found this: My source_file, when give to CLI ffmpeg as already converted to avi while the source_file I used inside python script was the original mp4. This difference in format is causing the issue I guess. But the avi file I use is converted from original using ffmpeg -y -i {source_file} -qscale:v 2 -threads 1 -async 1 -r 25 {avi_file}, so must contain same data. Ideally, what I want is to directly take in mp4 and work with it. Commented Dec 9, 2023 at 7:43
  • It seems like the original mp4 is using encoder: Lavf58.29.100 while after conversion, the new avi file is using software : Lavf59.27.100. Does this change the PCM extraction process? Commented Dec 9, 2023 at 7:51
  • I don't know if Lavf is loosless or not. if it is not loosless then reconverting from one version to another could change it. You would have to test it on your own Commented Dec 9, 2023 at 15:59

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.