Python - Extracting audio from video files to numpy array using ffmpeg

Ask Question

Asked 1 year, 11 months ago

Modified 1 year, 11 months ago

Viewed 814 times

I want to use python & ffmpeg-python to extract the audio from a video directly into numpy array.

Currently, I first dump the audio as a wav file using ffmpeg through CLI and read it back to Python using scipy.io.wavefile

$ ffmpeg -y -i {source_file} -qscale:a 0 -ac 1 -vn -threads 1 -ar 16000 out.wav

Followed by this snippet in python

_, audio1 = wavfile.read("out.wav")

Now I want to modify the above as

out, err = (
    ffmpeg
        .input(in_filename)
        .output(
            '-', format='s16le', 
            acodec='pcm_s16le', 
            ac=1, 
            ar='16k', 
            # sample_rate='16000',
            **{"qscale:a": 0}
        )
        # .overwrite_output()
        .run(capture_stdout=True, capture_stderr=True)
)

audio2 = np.frombuffer(out, dtype=np.int16)

(Ref: https://github.com/kkroening/ffmpeg-python/blob/master/examples/transcribe.py#L23)

However, when I compare audio1 and audio2, I see that the number of samples are different as well as the values. For the same file, when I read through wavefile, the signal has values in range [-221, 212], but the second approach yields values in range [-74, 72].

I also tried to plot the signal (starting 1 sec, 16000 samples) and it seems, there is some issue with delay and amplitude.

A closer look at the starting shows that there are also some 0 values at the beginning when I use wavfile

The starting delay seems to be around 320 samples.

Finally, the number of samples in both the arrays also seems to be different:

>> print(audio1.shape, audio2.shape)
(2091648,)), ((2091008,)

edited Dec 9, 2023 at 5:06

Ajeet Verma

4,5236 gold badges20 silver badges31 bronze badges

asked Dec 9, 2023 at 4:43

v-i-s-h

811 silver badge2 bronze badges

in python you have format='s16le', acodec='pcm_s16le' but maybe this makes difference? OR maybe you should write audio2 to file and read with wavfile to check if wavfile doesn't makes modifications. Or maybe it changes when you use frombuffer( ..., dtype=np.int16)

furas
– furas

2023-12-09 05:19:55 +00:00
Commented Dec 9, 2023 at 5:19
Did some more digging and found this: My source_file, when give to CLI ffmpeg as already converted to avi while the source_file I used inside python script was the original mp4. This difference in format is causing the issue I guess. But the avi file I use is converted from original using ffmpeg -y -i {source_file} -qscale:v 2 -threads 1 -async 1 -r 25 {avi_file}, so must contain same data. Ideally, what I want is to directly take in mp4 and work with it.

v-i-s-h
– v-i-s-h

2023-12-09 07:43:20 +00:00
Commented Dec 9, 2023 at 7:43
It seems like the original mp4 is using encoder: Lavf58.29.100 while after conversion, the new avi file is using software : Lavf59.27.100. Does this change the PCM extraction process?

v-i-s-h
– v-i-s-h

2023-12-09 07:51:29 +00:00
Commented Dec 9, 2023 at 7:51
I don't know if Lavf is loosless or not. if it is not loosless then reconverting from one version to another could change it. You would have to test it on your own

furas
– furas

2023-12-09 15:59:57 +00:00
Commented Dec 9, 2023 at 15:59

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Python - Extracting audio from video files to numpy array using ffmpeg

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest