1

Reading a text file into an array, extracting elements and sorting them is taking a very long time.

The text file is ffmpeg console output for R128 audio analysis. I need to get the highest M and S values. Example:

[Parsed_ebur128_0 @ 0x7fd32a60caa0] t: 4.49998    M: -22.2 S: -29.9     I: -27.0 LUFS     LRA:   9.8 LU  FTPK: -12.4 dBFS  TPK:  -9.7 dBFS  
[Parsed_ebur128_0 @ 0x7fd32a60caa0] t: 4.69998    M: -22.5 S: -28.6     I: -25.9 LUFS     LRA:  11.3 LU  FTPK: -12.7 dBFS  TPK:  -9.7 dBFS

The text file can be hundreds or thousands of lines long depending on the duration of the audio file being analysed
I want to find the highest M (-22.2) and S Values (-28.6) and assign them to variables M and S

This is what I am using currently:

ARRAY=()
while read LINE
do
ARRAY+=("$LINE")
done < $tempDir/text.txt

for LINE in "${ARRAY[@]}"
do
echo "$LINE" | sed -n ‘/B:/p' | sed 's/S:.*//' | sed -n -e 's/^.*M://p' | sed -n -e 's/-//p' >>/$tempDir/R128M.txt
done
for LINE in "${ARRAY[@]}"
do
echo "$LINE" | sed -n '/M:/p' | sed 's/I:.*//' | sed -n -e 's/^.*S://p' | sed -n -e 's/-//p' >>$tempDir/R128S.txt
done

cat $tempDir/R128M.txt
M=( $(sort $tempDir/R128M.txt) )

cat $tempDir/R128S.txt
S=( $(sort $tempDir/R128S.txt) )  

Is there a faster way of doing this?

1
  • 1
    Yes. One does not usually choose to write in bash script for its speed. Even a suitable perl script would probably give you an order of magnitude speed improvement here, especially seeing as it's largely regex processing. Commented Jul 16, 2016 at 8:58

2 Answers 2

2

Rather than reading in the whole file in memory, writing bits of it out to separate file, and reading those in again, just parse it and pick out the largest values:

$ awk '$7 > m || m == "" { m = $7 } $9 > s || s == "" { s = $9 } END { print m, s }' data
-22.2 -28.6

In your data, field 7 and 9 contains the values of M and S. The awk script will update its m and s variables if it finds larger values in these fields and then print the largest found at the end. The m == "" and s == "" are needed to trigger initialization of the values if no values has been read yet.

Another way with awk, which may look cleaner:

$ awk 'FNR == 1 { m = $7; s = $9; next } $7 > m { m = $7 } $9 > s { s = $9 } END { print m, s }' data

To assign them to M and S in the shell:

$ declare $( awk 'FNR == 1 { m = $7; s = $9; next } $7 > m { m = $7 } $9 > s { s = $9 } END { printf("M=%f S=%f\n", m, s) }' data )

$ echo $M $S
-22.200000 -28.600000

Adjust the printf() format to use %s instead of %f if you want the original strings instead of float values, or set the number of decimals you might want with, e.g., %.2f in place of %f.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks - this worked perfectly. Appreciate you putting the additional info also to assign in the script.
1

First of all, three-process pipe is a bit redundant for a single value extraction, especially taking into account you reinstantiate it anew for every line.

Next, you save all the values into a file and then sort that file, while all you need is the maximum value. You can easily find it during the very first (value extraction) loop, for additional O(N) running time, instead of I/O and sorting with all the I/O overhead and O(NlogN) sorting expenses. See ARITHMETIC EXPANSION and conditional expressions in bash manual.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.