3

Need you your help with the following awk syntax. Below is the output from my curl and I need to refine it a little bit:

INPUT:

        RSYNCA-BACKUP
        RCYNCA 20140517 0021 2182097 2082097
        2014820905820917 10:03:54
        2014820905820917 10:37:43
        0:33:49


        RSYNCB-COPY
        20140517 0020 2082097 1982097 7 6 20
        2014820905820917 09:32:20
        2014820905820917 10:59:20
        1:27:00


        RSYNCC
        RCYNCE 20140517 0021 2182097 2082097
        2014820905820917 10:03:54
        2014820905820917 10:37:43
        0:33:49

        RSYNCD
        20140517 0020 2082097 1982097 7 6 20
        2014820905820917 09:32:20
        2014820905820917 10:59:20
        1:27:00

THE OUTPUT I RECEIVE USING AWK:

RSYNCA-BACKUP|20140502|RCYNCA|10:02:15|10:56:42|0:54:27|FINISHED
RSYNCB-COPY|0022||15:31:06|        |0:06:04|INITIATED

Job Name|sequence|date|start time|end time|runtime|status

For job with initiated status there is no end time so the field can be empty

Thats what I am running and getting messed up awk output

awk -v RS='FINISHED|INITIATED' -v OFS='|' '$0 { print $1, $3, $2, $8, RS }'

RSYNCJOBNA|0021|20140502|2014820905820902|FINISHED|INITIATED
RSYNCJOBNA|0022|20140502|2014820905820902|FINISHED|INITIATED

My input from curl has additional spaces I guess, that might be the issue, here is a real example:

INITIATED
            RSYNCA
            20140502 0036 3682096 3582096 6 5
            2014820905820902 17:31:08
                0:17:16 ce eque
            INITIATED
            RSYNCA
            20140502 0035 3582096 3482096 6 5
            2014820905820902 17:01:10
                0:47:14 ce eque
            FINISHED
            RSYNCA
            20140502 0034 3482096 3382096 6 5
            2014820905820902 16:31:03
            2014820905820902 17:24:45
            0:53:42
            FINISHED
            RSYNCA
            20140502 0033 3382096 3282096 6 5
            2014820905820902 16:01:09
            2014820905820902 16:47:12
            0:46:03
1
  • Hey Barmar, have a question regarding the awk answer that you provided, works perfectly fine btw,however sometimes there is a discrepancy in my input, additional field is being added that I want to remove running the same awk command. I have edited the input above. Can you please help out ? Commented May 17, 2014 at 22:56

2 Answers 2

3
curl "URL" |
    awk -v OFS='|' '/FINISHED|INITIATED/ {
        status = $1; getline;
        jobname = $1; getline;
        sequence = $2; date = $1; getline;
        start = $2; getline;
        if (status == "FINISHED") { end = $2; getline } else { end = "        " }
        runtime = $1;
        print jobname, sequence, date, start, end, runtime, status;
    }'

The output with your input is:

RSYNCA|0036|20140502|17:31:08|        |0:17:16|INITIATED
RSYNCA|0035|20140502|17:01:10|        |0:47:14|INITIATED
RSYNCA|0034|20140502|16:31:03|17:24:45|0:53:42|FINISHED
RSYNCA|0033|20140502|16:01:09|16:47:12|0:46:03|FINISHED
Sign up to request clarification or add additional context in comments.

8 Comments

Thanks a lot Barmar. Any chance it can be done in 1 line so I can pipe it to curl?
The newlines just make it easier to read, you can remove them. But why does that prevent you from piping it to curl?
I am piping this AWK , but receive only one line in the output, jobs that have finished status are excluded: RSYNCJOBA |0034|20140502|16:31:03| |0:16:22| INITIATED
I created a file containing the sample input you posted, and piped it to my script. It worked fine, and showed both lines.
Seems my curl input has some additional spaces. I pasted the real output from curl
|
3

Here's one way using GNU AWK. Run like:

curl "$URL" | awk -f script.awk

Contents of script.awk:

BEGIN {

    RS="FINISHED|INITIATED"
    OFS="|"
}

s {
    print ( \
        $1, \
        $3, \
        $2, \
        $9, \
        (s == "FINISHED" ? $11 : "        "), \
        ($NF ~ /:/ ? $NF : $(NF-2)), \
        s \
    )
}

{
    s = RT
}

Results:

RSYNCA|0036|20140502|17:31:08|        |0:17:16|INITIATED
RSYNCA|0035|20140502|17:01:10|        |0:47:14|INITIATED
RSYNCA|0034|20140502|16:31:03|17:24:45|0:53:42|FINISHED
RSYNCA|0033|20140502|16:01:09|16:47:12|0:46:03|FINISHED

Alternatively, here's the one-liner:

curl "$URL" | awk 'BEGIN { RS="FINISHED|INITIATED"; OFS="|" } s { print $1, $3, $2, $9, (s == "FINISHED" ? $11 : "        "), ($NF ~ /:/ ? $NF : $(NF-2)), s } { s = RT }'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.