19

I have a a couple of Apache log files that have been appended together and I need to sort them by date. They're in the following format:

"www.company.com" 192.168.1.1 [01/Jan/2011:00:04:17 +0000] "GET /foobar/servlet/partner/search/results?catID=1158395&country=10190&id=5848716&order_by=N-T&order_by_dir=-&product=10361996&siteID=1169823&state= HTTP/1.1" 200 10459 0 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

What's the best way to go about doing this on the Linux command line?

1
  • Maybe move this to ServerFault or unix.stackexchange.com? Commented Apr 15, 2011 at 5:57

5 Answers 5

40
#!/bin/sh
if [ ! -f $1 ]; then
    echo "Usage: $0 "
    exit
fi
echo "Sorting $1"
sort -t ' ' -k 4.9,4.12n -k 4.5,4.7M -k 4.2,4.3n -k 4.14,4.15n -k 4.17,4.18n -k 4.20,4.21n $1 > $2
Sign up to request clarification or add additional context in comments.

5 Comments

Interesting. The man page doesn't make it clear that you can use "M" as part of a sort key; thanks for pointing this out!
Great answer and works perfectly for Apache logs which is indeed field 4, assuming default separator is a space. Spending a few minutes on the sort man pages helped me understand this command and I am feeling confident to write my own sorts. Also just to add for any other readers, this is using character ranges from the given field 4 in order in which you want it to be sorted. -k 4.9,4.12n is year 'n' denotes numeric sort, -k 4.5,4.7M is the 3 letter month abbreviation sort and M tells it to do exactly that. @offby1 it does show it as an option but the example are not great.
31-Aug-2020 ends up after 26-Sep-2020 on this one. not working no matter how i try it.
none of this is working for me sudo find /var/log/*php*/ -type f -exec grep max_child {} \; | sort -t ' ' -k 3.9,3.12n -k 3.5,3.7M results are out of order. my logs look like this, maybe cause they arent apache? [31-Aug-2020 21:08:42] WARNING: [. or is becaue its multiple files?
Important note: The month sort uses the current locale. It didn't work for me as Oct was placed first, before the other months. October is called Oktober in Danish (abbr. Okt) so it was treated as unknown which comes first in the sort order. Solution: Prefix the command with LC_ALL=C: LC_ALL=C sort -t ' ' -k 4.9,4.12n -k 4.5,4.7M ...
11

This is almost too trivial to point out, but just in case it confuses anyone: grm's answer should technically be using field #3, not 4, to match the questioner's exact log format. That is, it should read:

    sort -t ' ' -k 3.9,3.12n -k 3.5,3.7M ...

His answer is correct in every other respect, and can be used as-is for the common log format.

2 Comments

perhaps this would have been better as a comment--but it's correct, so have some internet points :)
For my default log entry 10.0.0.230 - - [28/Jan/2019:03:05:31 +0000] "POST ... it is field 4
0

using ' ' as the field separator fails when the log lines may contains multiple ip addresses (separated by ', ')

try using

sort -t '[' -k 2.8,2.11n -k 2.4,2.6M -k 2.1,2.2n -k 2.13,2.14n -k 2.16,2.17n -k 2.19,2.20n

Comments

0

Try Super Speedy Syslog Searcher

(assuming you have rust installed)

cargo install super_speedy_syslog_searcher

then

s4 /var/log/apache2

Comments

-1

I figured this out with online examples, skimming through 'The Linux Command Line' book, man pages, and trial-and-error:

sort -k 3.9nb -k 3.5Mb -k 3.2nb [location and name of file]

The b along with the n or M will stop sort from reading characters that do not make sense such as / and : which makes life easier when the space is already used as a delimiter and you still have to separate by :, /, and/or any other character you wish smite when sorting.

The above script will sort by year first, then by month and then by date. Place an r next to the all the b's to descend.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.