3

I'm trying to work on a logfile, and I need to be able to specify the range of dates. So far (before any processing), I'm converting a date/time string to timestamp using date --date "monday" +%s.

Now, I want to be able to iterate over each line in a file, but check if the date (in a human readable format) is within the allowed range. To do this, I'd like to do something like the following:

echo `awk '{if(`date --date "$3 $4 $5 $6 $7" +%s` > $START && `date --date "" +%s` <= $END){/*processing code here*/}}' myfile`

I don't even know if thats possible... I've tried a lot of variations, plus I couldn't find anything understandable/usable online.

Thanks

Update:

Example of myfile is as follows. Its logging IPs and access times:

123.80.114.20      Sun May 01 11:52:28 GMT 2011
144.124.67.139     Sun May 01 16:11:31 GMT 2011
178.221.138.12     Mon May 02 08:59:23 GMT 2011
4
  • 2
    As an alternative to shelling out to the date command, consider using the built-in awk function, mktime(), to parse a date out of a string. Something like: sec=mktime($3" "$4" "$5" "$6" "$7" 00"); if( $sec > $START ) ... Commented May 3, 2011 at 21:14
  • Is this a one-off project, or are you going to be processing megabytes of logfiles every day? Commented May 3, 2011 at 21:26
  • One-off. Its actually exam revision, and can't find anything about it in lecture notes. Easier to ask on here as answers tend to be better! Commented May 3, 2011 at 21:28
  • Pardon my editing error, if you saw this immediately, the array was incomplete. SHould be good now. Commented May 3, 2011 at 21:40

3 Answers 3

1

Given what you have to do, its really not that hard AND it is much more efficient to do your date processing by converting to strings and comparing.

Here's a partial solution that uses associative arrays to convert the month value to a number. Then you rely on the %02d format specifier to ensure 2 digits. You can reformat the dateTime value with '.', etc or leave the colons in the hr:min:sec if you really need the human readability.

The YYYYMMDD format is a big help in these sort of problems, as LT, GT, EQ all work without any further formatting.

echo "178.221.138.12     Mon May 02 08:59:23 GMT 2011" \
| awk 'BEGIN {
mons["Jan"]=1 ; mons["Feb"]=2; mons["Mar"]=3
mons["Apr"]=4 ; mons["May"]=5; mons["Jun"]=6
mons["Jul"]=7 ; mons["Aug"]=8; mons["Sep"]=9
mons["Oct"]=10 ; mons["Nov"]=11; mons["Dec"]=12
}
{ 
   # 178.221.138.12     Mon May 02 08:59:23 GMT 2011
   printf("dateTime=%04d%02d%02d%02d%02d%02d\n", 
       $NF, mons[$3], $4, substr($5,1,2), substr($5,4,2), substr($5,7,2) )
} ' -v StartTime=20110105235959

The -v StartTime is ilustrative of how to pass in (and the matching format) your starTime value.

I hope this helps.

Sign up to request clarification or add additional context in comments.

1 Comment

Yep, thats working nicely! I did try the arrays before but couldn't get it working. thanks :)
1

Here's an alternative approach using awk's built-in mktime() function. I've never bothered with the month parsing until now - thanks to shelter for that part (see accepted answer). It always feels time to switch language around that point.

#!/bin/bash
# input format:
#(1                  2   3   4  5        6   7)
#123.80.114.20      Sun May 01 11:52:28 GMT 2011

awk -v startTime=1304252691 -v endTime=1306000000 '
BEGIN {
  mons["Jan"]=1 ; mons["Feb"]=2; mons["Mar"]=3
  mons["Apr"]=4 ; mons["May"]=5; mons["Jun"]=6
  mons["Jul"]=7 ; mons["Aug"]=8; mons["Sep"]=9
  mons["Oct"]=10 ; mons["Nov"]=11; mons["Dec"]=12;
}
{
  hmsSpaced=$5; gsub(":"," ",hmsSpaced); 
  timeInSec=mktime($7" "mons[$3]" "$4" "hmsSpaced); 
  if (timeInSec > startTime && timeInSec <= endTime) print $0
}' myfile

(I've chosen example time thresholds to select only the last two log lines.)

Note that if the mktime() function were a bit smarter this whole thing would reduce to:

awk -v startTime=1304252691 -v endTime=1306000000 't=mktime($7" "$3" "$4" "$5); if (t > startTime && t <= endTime) print $0}' myfile

Comments

0

I'm not sure of the format of the data you're parsing, but I do know that you can't use the backticks within single quotes. You'll have to use double quotes. If there are too many quotes being nested, and it's confusing you, you can also just save the output of your date command to a variable beforehand.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.