2

I am running a for loop inside a bash file which will check some files (.ts) for a specific string and print the matching lines in a result file.

Here is the code:

#! /bin/bash

for file in *.ts;
do awk -f test_function.awk $file > result.txt;
done

And this is the test_function.awk file:

match($0, /<name>(.*)<\/name>/,n){ nm=n[1] }
match($0, /<source>(.*)<\/source>/,s){ src=s[1] }
/unfinished/{ print "name: " nm, "source: " src }

And this is one of the input files that contains "unfinished" and needs to be included in the output:

<context>
    <name>AccuCapacityApp</name>
    <message>
        <source>Capacity</source>
        <translation type="unfinished">Kapazität</translation>
    </message>
    <message>
        <source>Charge Level</source>
        <translation type="unfinished"></translation>
    </message>
    <message>
        <source>Sel (Yes)</source>
        <translation type="unfinished">Sel (Ja)</translation>
    </message>
    <message>
        <source>Esc (No)</source>
        <translation type="unfinished">Esc (Nein)</translation>
    </message>
</context>

It gives output like this:

name: AccuCapacityApp source: Capacity
name: AccuCapacityApp source: Charge Level
name: AccuCapacityApp source: Sel (Yes)

And this is one of the input files that doesn't contain "unfinished" and needs to be excluded from the output:

<context>
    <name>ATM FSM state</name>
    <message>
        <source>Hunting</source>
        <translation>Sync-Suche</translation>
    </message>
    <message>
        <source>Pre-Sync</source>
        <translation>Pre-Sync</translation>
    </message>
    <message>
        <source>Sync</source>
        <translation>Sync</translation>
    </message>
</context>

What I want to do is to print the processing file name in the beginning of each paragrapgh of matching lines in the result file, ONLY when the matching strings are found, like following:

Processign file: alpha.txt
name: AccuCapacityApp source: Capacity
name: AccuCapacityApp source: Charge Level
name: AccuCapacityApp source: Sel (Yes)

Processing file: gamma.txt
name: AccuCapacityApp source: Capacity
name: AccuCapacityApp source: Charge Level
name: AccuCapacityApp source: Sel (Yes)

How can I achieve this?

I know the file name can be appended and then the matching lines can be appended to the result file. But I want to have a blank result file each time I run the bash file and only write the filename and content when the matching string is found. So I think appending the file name will not work. I have tried printing the file name with echo ${file##*/}, echo $file and {print FILENAME};{print "\t" $0} but unable to print as desired.

2
  • 1
    I don't know where you found for file in $(ls *.ts); but it's a complete anti-pattern. Use for file in *.ts; and quote your variables inside the loop. Commented Sep 8, 2017 at 8:34
  • Thank you. I have updated my code. Commented Sep 8, 2017 at 8:36

1 Answer 1

1

Based on your update, I think this does what you want:

match($0, /<name>(.*)<\/name>/,m){ nm = m[1] }
match($0, /<source>(.*)<\/source>/,m){ src = m[1] }
/unfinished/ { list[++n] = src }
ENDFILE {
    for (i = 1; i <= n; ++i) {
        print "name:", nm, "source:", list[i]
    }
    n = 0
}

Only save elements when unfinished is found, the loop through the list at the end of each file. n keeps a count of the number of matches in the current file.

Use the script like this (no need for a shell loop):

awk -f test_function.awk *.ts > result.txt

Note that ENDFILE is a GNU awk extension, but then so is the third argument to match that you were already using, so I guess that's OK for you.

Sign up to request clarification or add additional context in comments.

9 Comments

Thank you. The "unfinished" is the string my code is looking a match for.
Your first solution works but prints the file name even the matching string is not found. Your second solution doesnt check for "unfinished" string (as you already mentioned it's purpose was unclear). I will try to modify your code and add it myself.
I guess you can probably add /unfinished/ { f = 1 } then only print if f is set. Remember that you need to set it back to 0 between files.
Umm, I tried this: match($0, /<name>(.*)<\/name>/,n){ nm=n[1] } match($0, /<source>(.*)<\/source>/,s){ src=s[1] }/unfinished/ { f = 1 } ENDFILE { if (f==1) { print "Processing file:", FILENAME print "name: " nm, "source: " src print "" f=0 # for next file } } But it didn't work. No matter where I reset the f, it doesn't work. Can you please have a look?
The problem is that you haven't shown us an example of your input that corresponds to the output that you want, and "it didn't work" doesn't really give me much to go on. At first glance, it looks like you need to check all three things if (nm != "" && src != "" && f) (then reset all 3 vars).
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.