I started working in FFmpeg and I want to create a list that will contain start and end timestamps of silence intervals. I did print out these intervals using the FFmpeg but I need to format that output so it looks a bit more readable, so that is why I want to create a list out of it and then print it using a custom function. I know that I should go with regex here but I am not sure how should I write it nor how should I read the FFmpeg console output. My function for silence detection looks like:
def detect_silence_ffmpeg():
command = r"ffmpeg -i audio.wav -af silencedetect=n=-40dB:d=0.5 -f null - "
subprocess.call(command, shell=True)
And the output of this function on a 7 second long sample video is:
ffmpeg version git-2020-06-03-b6d7c4c Copyright (c) 2000-2020 the FFmpeg developers
built with gcc 9.3.1 (GCC) 20200523
configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libsrt --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --disable-w32threads --enable-libmfx --enable-ffnvcodec --enable-cuda-llvm --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt --enable-amf
libavutil 56. 49.100 / 56. 49.100
libavcodec 58. 90.100 / 58. 90.100
libavformat 58. 44.100 / 58. 44.100
libavdevice 58. 9.103 / 58. 9.103
libavfilter 7. 84.100 / 7. 84.100
libswscale 5. 6.101 / 5. 6.101
libswresample 3. 6.100 / 3. 6.100
libpostproc 55. 6.100 / 55. 6.100
Guessed Channel Layout for Input Stream #0.0 : stereo
Input #0, wav, from 'audio.wav':
Metadata:
encoder : Lavf58.44.100
Duration: 00:00:07.34, bitrate: 1411 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, null, to 'pipe:':
Metadata:
encoder : Lavf58.44.100
Stream #0:0: Audio: pcm_s16le, 44100 Hz, stereo, s16, 1411 kb/s
Metadata:
encoder : Lavc58.90.100 pcm_s16le
[silencedetect @ 00000202fc71e680] silence_start: 0
[silencedetect @ 00000202fc71e680] silence_end: 1.16374 | silence_duration: 1.16374
[silencedetect @ 00000202fc71e680] silence_start: 1.94558
[silencedetect @ 00000202fc71e680] silence_end: 3.41345 | silence_duration: 1.46787
[silencedetect @ 00000202fc71e680] silence_start: 3.8578
[silencedetect @ 00000202fc71e680] silence_end: 5.84844 | silence_duration: 1.99063
[silencedetect @ 00000202fc71e680] silence_start: 6.43653
size=N/A time=00:00:07.33 bitrate=N/A speed= 308x
video:0kB audio:1264kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[silencedetect @ 00000202fc71e680] silence_end: 7.33868 | silence_duration: 0.902154
And this code should be implemented on an hour or so long videos so I really need to find a way to format this output a bit better than this. That would be it, any help would be much appreciated :)
P.S: the idea is that this should work on Windows mainly, but if the cross-platform is possible too it would be great.