0

I am trying to extract specific values from a logfile like below :

Table "xxx"."xxxx":

  3785568 Rows successfully loaded.
  0 Rows not loaded due to data errors.
  0 Rows not loaded because all WHEN clauses were failed.
  0 Rows not loaded because all fields were null.

Bind array size not used in direct path.
Column array  rows :    5000
Stream buffer bytes:  256000
Read   buffer bytes: 1048576

Total logical records skipped:          0
Total logical records read:       3785568
Total logical records rejected:         0
Total logical records discarded:        0
Total stream buffers loaded by SQL*Loader main thread:      878
Total stream buffers loaded by SQL*Loader load thread:      796

Run began on Fri Sep 01 04:00:26 2017
Run ended on Fri Sep 01 04:04:45 2017

Elapsed time was:     00:04:19.24
CPU time was:         00:00:08.56

What i would like to retrieve are :

3785568 as number_rows
Sep 01 04:00:26 2017 as start_time
Sep 01 04:04:45 2017 as end_time 

How is this possible this extraction with awk?

Any help would be really much appreciated :)

Thank you very much for your time.

6 Answers 6

1
awk '/Rows successfully loaded/{
        print $1 " as number_rows"
        next
    }
    /Run began on/{ 
        sub(/Run began on /,""); 
        print $0 " as start_time"
        next 
   }
   /Run ended on/{
        sub(/Run ended on /,"");    
        print $0 " as end_time"
   }' infile

Input

$ cat infile
Table "xxx"."xxxx":

  3785568 Rows successfully loaded.
  0 Rows not loaded due to data errors.
  0 Rows not loaded because all WHEN clauses were failed.
  0 Rows not loaded because all fields were null.

Bind array size not used in direct path.
Column array  rows :    5000
Stream buffer bytes:  256000
Read   buffer bytes: 1048576

Total logical records skipped:          0
Total logical records read:       3785568
Total logical records rejected:         0
Total logical records discarded:        0
Total stream buffers loaded by SQL*Loader main thread:      878
Total stream buffers loaded by SQL*Loader load thread:      796

Run began on Fri Sep 01 04:00:26 2017
Run ended on Fri Sep 01 04:04:45 2017

Elapsed time was:     00:04:19.24
CPU time was:         00:00:08.56

Output

$ awk '/Rows successfully loaded/{
      print $1 " as number_rows"
      next
  }
  /Run began on/{ 
      sub(/Run began on /,""); 
      print $0 " as start_time"
      next 
  }
  /Run ended on/{
      sub(/Run ended on /,""); 
      print $0 " as end_time"
  }' infile

3785568 as number_rows
Fri Sep 01 04:00:26 2017 as start_time
Fri Sep 01 04:04:45 2017 as end_time
Sign up to request clarification or add additional context in comments.

Comments

0

So for your given file this works:

awk '/Rows/{ if (++n==1){ print $1 } }/began/ || /ended/{ print $5,$6,$7,$8 }' log.file

output:

3785568
Sep 01 04:00:26 2017
Sep 01 04:04:45 2017

5 Comments

BEGIN{ OFS = " " } is doing nothing useful, just setting OFS to the default value it already has.
Yep, you're right. I was in a hurry in getting to lunch there :)
@JFS31, another hot question please! If I want to add to this awk you provided one more argument extractrion, the table name without quotes how could this be achieved?????
This should do: awk '/^Table/{ gsub(/"||:/,"",$2); print $2 }/Rows/{ if (++n==1){ print $1 } }/began/ || /ended/{ print $5,$6,$7,$8 }' it gives you the table name without quotes and the : at the end. And next time it would be nice if you would ask it in a separate question.
@JFS31 Thanks, but this awk does not do the desired output.For more details, I logged stackoverflow.com/questions/46073193/… could you please have a look? :)
0

For this purpose better solution is grep

ROWS=`grep "Total logical records read" logfile.txt | sed 's/[^0-9]*//g'` 
START=`grep "Run began on " | cut -d" " -f4-`

Comments

0
awk '/[[:digit:]]+[[:blank:]]Rows successfully/ { print $1" as number_rows" } /^Run began on .*$/ { print $4" "$5" "$6" "$7" "$8" as start_time" } /^Run ended on .*$/ { print $4" "$5" "$6" "$7" "$8" as end_time"}' filename

1 Comment

While this code snippet may solve the question, including an explanation really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion. Please also try not to crowd your code with explanatory comments, this reduces the readability of both the code and the explanations!
0

Short awk approach:

awk '/Rows success/{ print $1 }/^Run (began|ended)/{ print $5,$6,$7,$8 }' file

The output:

3785568
Sep 01 04:00:26 2017
Sep 01 04:04:45 2017

1 Comment

Dear Roman, very good one as well, thank you, already answered by JFS31 above!
0

if you do not mind Perl or grep with -P

perl -lne 'print $& if /\d+ (?=Rows successfully)|^Run (began|ended) on Fri \K[^\n\r]+/g' file

it outputs:

3785568 
Sep 01 04:00:26 2017
Sep 01 04:04:45 2017

or:

grep -Po '\d+ (?=Rows successfully)|^Run (began|ended) on Fri \K[^\n\r]+' file

1 Comment

Very good idea, although I dont prefer perl on my prod machine!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.