2

I would like to retrieve from the following logs the date, the 5 URI length, the ab and cde:

10.10.10.10 - - [26/Oct/2020:19:50:13 +0000] "GET /five/six/seven/eight/nine/en?from=1603738800&to=1603785600ncludedInRange=false HTTP/1.1" 200 255441 "-" "Opera com.test.super/1.10.4;11072 (Linux;Neon KNWWWfj;0,02.2)" "10.10.10.10""f799b6b9-747f-4f22-a1bf-4b7de885fc60""-" "-" "-" "-"ab=0.110 cde=0.102
11.1.1.1 - - [26/Oct/2020:19:50:14 +0000] "GET /one/two/three/four/five/en HTTP/1.1" 200 2832 "-" "Opera com.test.super/1.10.4;11072 (Linux;Neon KNWWWfj;0,02.2)" "11.1.1.1""19a8ee3c-9cb3-4ba6-9732-eb4923601e92""-" "-" "-" "-"ab=0.111 cde=0.112

e.g.

26/Oct/2020:19:50:13 /five/six/seven/eight/nine ab=0.110 cde=0.102

I have tried the following command, but I get only a part of it. Can you please help?

awk '{print $4 "\t" $7 "\t" $(NF-1),"\t",$NF}' |sed 's/"-"//g'
2
  • 1
    Why is there no output for second line? Commented Oct 26, 2020 at 20:19
  • 1
    You and the poster at unix.stackexchange.com/q/616438/133219 should talk as you're both parsing very similar input files. Commented Oct 26, 2020 at 20:20

2 Answers 2

2
$ awk -F'[][[:space:]"]+' -v OFS='\t' '{match($7,"(/[^/]*){5}"); print $4, substr($7,1,RLENGTH), $(NF-1), $NF}' file
26/Oct/2020:19:50:13    /five/six/seven/eight/nine      ab=0.110        cde=0.102
26/Oct/2020:19:50:14    /one/two/three/four/five        ab=0.111        cde=0.112
Sign up to request clarification or add additional context in comments.

Comments

1

Based on @Ed Morton, but setting FS in 5 parts:

$ awk -v FS='[[]|\\+[[:digit:]]+[]]|GET |/en|"+-"' '{print $2,$4,$NF}' file
26/Oct/2020:19:50:13  /five/six/seven/eight/nine ab=0.110 cde=0.102
26/Oct/2020:19:50:14  /one/two/three/four/five ab=0.111 cde=0.112

Updated. Thanks to @Ed Morton.

4 Comments

In shell you should always use single quotes around strings and scripts unless you NEED double quotes to make the shell interpret it, e.g. for globbing, filename expansion, variable expansion, etc. If you follow that rule it'll make all your code more concise and more robust. In this case you won't need nearly as many backslashes in your FS declaration - right now you're asking the shell to interpret the string in -v FS"..." by surrounding it in double quotes and then escaping everything in it to stop the shell from interpreting it which makes no sense, just do -v FS='...' or -F'...'.
There's also no reason to put - or " inside a bracket expression or escape / in a dynamic regexp - they're all already literal chars. So idk if the code's right or wrong in general but I think all you need to specify that FS is -F '[[]|\+[[:digit:]]+[]]|GET |/en|"+-"'
@Ed Morton. Yes, I see, but is there specific documentation or a guide that includes a correct use of the syntactic peculiarities of regexp in awk? it would help.
awk just implements POSIX EREs, so see the POSIX standard, pubs.opengroup.org/onlinepubs/9699919799/basedefs/…. Awk does allow computed regexps (made up from strings and/or variables) as well as literal regexps - that's documented in the awk standard and all man pages and just means you need to be aware it's parsed twice and so needs extra escapes. Some VERSIONS of awk have minor extensions to POSIX such as \< and \> word boundaries or \s/\S shorthand, e.g. GNU awk, so see the man page for that awk version for details. Otherwise there are no peculiarities.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.