0

I've this data :

cat >data1.txt <<'EOF'
2020-01-27-06-00;/dev/hd1;100;/
2020-01-27-12-00;/dev/hd1;100;/
2020-01-27-18-00;/dev/hd1;100;/
2020-01-27-06-00;/dev/hd2;200;/usr
2020-01-27-12-00;/dev/hd2;200;/usr
2020-01-27-18-00;/dev/hd2;200;/usr
EOF

cat >data2.txt <<'EOF'
2020-02-27-06-00;/dev/hd1;120;/
2020-02-27-12-00;/dev/hd1;120;/
2020-02-27-18-00;/dev/hd1;120;/
2020-02-27-06-00;/dev/hd2;230;/usr
2020-02-27-12-00;/dev/hd2;230;/usr
2020-02-27-18-00;/dev/hd2;230;/usr
EOF

cat >data3.txt <<'EOF'
2020-03-27-06-00;/dev/hd1;130;/
2020-03-27-12-00;/dev/hd1;130;/
2020-03-27-18-00;/dev/hd1;130;/
2020-03-27-06-00;/dev/hd2;240;/usr
2020-03-27-12-00;/dev/hd2;240;/usr
2020-03-27-18-00;/dev/hd2;240;/usr
EOF

I would like to create a .txt file for each filesystem ( so hd1.txt, hd2.txt, hd3.txt and hd4.txt ) and put in each .txt file the sum of the value from each FS from each dataX.txt. I've some difficulties to explain in english what I want, so here an example of the result wanted

Expected content for the output file hd1.txt:

2020-01;/dev/hd1;300;/
2020-02;/dev/hd1;360;/
2020-03;/dev/hd1;390:/

Expected content for the file hd2.txt:

2020-01;/dev/hd2;600;/usr
2020-02;/dev/hd2;690;/usr
2020-03;/dev/hd2;720;/usr

The implementation I've currently tried:

for i in $(cat *.txt | awk -F';' '{print $2}' | cut -d '/' -f3| uniq)
do
    cat *.txt | grep -w $i | awk -F';' -v date="$(cat *.txt | awk -F';' '{print $1}' | cut -d'-' -f-2 | uniq )" '{sum+=$3} END {print date";"$2";"sum}' >> $i

done

But it doesn't works...

Can you show me how to do that ?

2
  • 2
    What do you mean it doesn't work ? Does it shows an error message, do you have a wrong result, does it go in an infinite loop ? Could you edit your question to add more details ? If it shows wrong data, please include the output in your edit Commented Mar 5, 2020 at 13:36
  • 1
    You might want | sort | uniq instead of just uniq. You have iterate over each files anyway. Commented Mar 5, 2020 at 13:38

1 Answer 1

2

Because the format seems to be so constant, you can delimit the input with multiple separators and parse it easily in awk:

awk -v FS='[;-/]' '
prev != $9 {
    if (length(output)) {
        print output >> fileoutput
    }
    prev = $9
    sum = 0
}
{
    sum += $9
    output = sprintf("%s-%s;/%s/%s;%d;/%s", $1, $2, $7, $8, sum, $11)
    fileoutput = $8 ".txt"
}
END {
    print output >> fileoutput
}
' *.txt

Tested on repl generates:

+ cat hd1.txt
2020-01;/dev/hd1;300;/
2020-02;/dev/hd1;360;/
2020-03;/dev/hd1;390;/
+ cat hd2.txt
2020-01;/dev/hd2;600;/usr
2020-02;/dev/hd2;690;/usr
2020-03;/dev/hd2;720;/usr

Alternatively, you could -v FS=';' and use split to split first and second column to extract the year and month and the hdX number.

If you seek a bash solution, I suggest you invert the loops - first iterate over files, then over identifiers in second column.

for file in *.txt; do
    prev=
    output=
    while IFS=';' read -r date dev num path; do
        hd=$(basename "$dev")
        if [[ "$hd" != "${prev:-}" ]]; then
            if ((${#output})); then
                printf "%s\n" "$output" >> "$fileoutput"
            fi
            sum=0
            prev="$hd"
        fi
        sum=$((sum + num))
        output=$(
            printf "%s;%s;%d;%s" \
            "$(cut -d'-' -f1-2 <<<"$date")" \
            "$dev" "$sum" "$path"
        )
        fileoutput="${hd}.txt"
    done < "$file"
    printf "%s\n" "$output" >> "$fileoutput"
done

You could also almost translate awk to bash 1:1 by doing IFS='-;/' in while read loop.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.