Sorting strings from array takes a long time

Question

Reading a text file into an array, extracting elements and sorting them is taking a very long time.

The text file is ffmpeg console output for R128 audio analysis. I need to get the highest M and S values. Example:

[Parsed_ebur128_0 @ 0x7fd32a60caa0] t: 4.49998    M: -22.2 S: -29.9     I: -27.0 LUFS     LRA:   9.8 LU  FTPK: -12.4 dBFS  TPK:  -9.7 dBFS  
[Parsed_ebur128_0 @ 0x7fd32a60caa0] t: 4.69998    M: -22.5 S: -28.6     I: -25.9 LUFS     LRA:  11.3 LU  FTPK: -12.7 dBFS  TPK:  -9.7 dBFS

The text file can be hundreds or thousands of lines long depending on the duration of the audio file being analysed
I want to find the highest M (-22.2) and S Values (-28.6) and assign them to variables M and S

This is what I am using currently:

ARRAY=()
while read LINE
do
ARRAY+=("$LINE")
done < $tempDir/text.txt

for LINE in "${ARRAY[@]}"
do
echo "$LINE" | sed -n ‘/B:/p' | sed 's/S:.*//' | sed -n -e 's/^.*M://p' | sed -n -e 's/-//p' >>/$tempDir/R128M.txt
done
for LINE in "${ARRAY[@]}"
do
echo "$LINE" | sed -n '/M:/p' | sed 's/I:.*//' | sed -n -e 's/^.*S://p' | sed -n -e 's/-//p' >>$tempDir/R128S.txt
done

cat $tempDir/R128M.txt
M=( $(sort $tempDir/R128M.txt) )

cat $tempDir/R128S.txt
S=( $(sort $tempDir/R128S.txt) )

Is there a faster way of doing this?

Yes. One does not usually choose to write in bash script for its speed. Even a suitable perl script would probably give you an order of magnitude speed improvement here, especially seeing as it's largely regex processing. — davmac
– davmac, Commented Jul 16, 2016 at 8:58

Kusalananda · Accepted Answer · 2016-07-16 10:19:14Z

2

Rather than reading in the whole file in memory, writing bits of it out to separate file, and reading those in again, just parse it and pick out the largest values:

$ awk '$7 > m || m == "" { m = $7 } $9 > s || s == "" { s = $9 } END { print m, s }' data
-22.2 -28.6

In your data, field 7 and 9 contains the values of M and S. The awk script will update its m and s variables if it finds larger values in these fields and then print the largest found at the end. The m == "" and s == "" are needed to trigger initialization of the values if no values has been read yet.

Another way with awk, which may look cleaner:

$ awk 'FNR == 1 { m = $7; s = $9; next } $7 > m { m = $7 } $9 > s { s = $9 } END { print m, s }' data

To assign them to M and S in the shell:

$ declare $( awk 'FNR == 1 { m = $7; s = $9; next } $7 > m { m = $7 } $9 > s { s = $9 } END { printf("M=%f S=%f\n", m, s) }' data )

$ echo $M $S
-22.200000 -28.600000

Adjust the printf() format to use %s instead of %f if you want the original strings instead of float values, or set the number of decimals you might want with, e.g., %.2f in place of %f.

edited Jul 16, 2016 at 10:19

answered Jul 16, 2016 at 9:02

Kusalananda

15.8k3 gold badges47 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ssmc Over a year ago

Thanks - this worked perfectly. Appreciate you putting the additional info also to assign in the script.

bipll · Accepted Answer · 2016-07-16 09:03:55Z

1

First of all, three-process pipe is a bit redundant for a single value extraction, especially taking into account you reinstantiate it anew for every line.

Next, you save all the values into a file and then sort that file, while all you need is the maximum value. You can easily find it during the very first (value extraction) loop, for additional O(N) running time, instead of I/O and sorting with all the I/O overhead and O(NlogN) sorting expenses. See ARITHMETIC EXPANSION and conditional expressions in bash manual.

answered Jul 16, 2016 at 9:03

bipll

12k1 gold badge21 silver badges33 bronze badges

Collectives™ on Stack Overflow

Sorting strings from array takes a long time

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related