2

I am writing a log file parser, and have written some test code to parse this in C.

The string to be parsed looks as follows:

s := `10.0.0.1 Jan 11 2014 10:00:00 hello`

In C, parsing this in place is quite easy. First I find the pointer to the date within the string and then just consume as much as possible using strptime(). This is possible as strptime() will return the position in the string after the call.

Eventually I decided to go with Go instead of C, but while porting the code over I have some issues. As far as I can tell, time.Parse() does not give me any option to parse from within an existing string (though this can be solved with slices) or indication about how much of the original string it have consumed when parsing the date from within the string.

Is there any elegant way in Go I can parse the date/time right out of the string without having to first extract the datetime into an exact slice e.g. by returning the number of characters extracted after parsing?

2
  • 1
    1. time.Parse does allow to parse/skip over fixed "noise". 2. Subslicing your line at the first space is so dead simple that I consider this to be the elegant solution you are looking for. Commented Feb 17, 2014 at 14:09
  • Fair enough, but what about the the noise after the date? Since I want to continue parsing at that precise location within the data, I need to know where the time.Parse ends in the slice. Commented Feb 17, 2014 at 14:49

2 Answers 2

3

Unfortunately, the time.Parse method can't tell you how many characters it parsed, so we will need to investigate other elegant solutions. In your example of parsing log statements, the use of regular expressions, as @rob74 suggested, is a reasonably elegant strategy. The example below ignores errors for brevity:

var r = regexp.MustCompile(`^((?:\d{1,3}\.){3}\d{1,3}) ([a-zA-Z]{3} \d{1,2} \d{4} \d{1,2}:\d{2}:\d{2}) (.*)`)
const longForm = "Jan 02 2006 15:04:05"

func parseRegex(s string) (ip, msg string, t time.Time) {
    m := r.FindStringSubmatch(s)
    t, _ = time.Parse(longForm, m[2])
    ip, msg = m[1], m[3]
    return ip, msg, t
}

Benchmarks show the above regular expression to be about two times more efficient than @rob74's example on my machine, parsing about a 100,000 lines per second:

BenchmarkParseRegex           100000         17130 ns/op
BenchmarkParseRegexRob74       50000         32788 ns/op

We can, however, keep the solution short and more efficient if we use strings.SplitN instead. For example:

func parseSplit(s string) (ip, msg string, t time.Time) {
    parts := strings.SplitN(s, " ", 6)
    t, _ = time.Parse(longForm, strings.Join(parts[1:5], " "))
    ip, msg = parts[0], parts[5]
    return ip, msg, t
}

This splits the string on the first 5 spaces and puts the remaining string (the message part) inside the final parts slice element. This is not very elegant, since we rely on the number of spaces in the date format, but we could count the spaces in the date format string programmatically for a more general solution. Let's see how this compares to our regular expression solution:

BenchmarkParseRegex   100000         17130 ns/op
BenchmarkParseSplit   500000          3557 ns/op

It compares quite favorably, as it turns out. Using SplitN is about five times faster than using regular expressions, and still results in concise and readable code. It does this at the cost of using slightly more memory for the slice allocation.

Sign up to request clarification or add additional context in comments.

Comments

0

Maybe you should consider using a regular expression to split the log line, e.g.:

package main

import "fmt"
import "time"
import "regexp"

func main() {
    s := "10.0.0.1 Jan 11 2014 10:00:00 hello"
    r := regexp.MustCompile("^([^/w]+) ([a-zA-Z]+ [0-9]{1,2} [0-9]{4} [0-9]{1,2}:[0-9]{2}:[0-9]{2}) (.*)")
    m := r.FindStringSubmatch(s)
    if len(m) >= 4 {
        fmt.Println("IP:", m[1])
        fmt.Println("Timestamp:", m[2])
        fmt.Println("Message:", m[3])
        t, err := time.Parse("Jan 02 2006 15:04:05", m[2])
        if err != nil {
            fmt.Println(err.Error())
        } else {
            fmt.Println("Parsed Time:",t)
        }
    } else {
           fmt.Println("Regexp mismatch!")
    }
}

http://play.golang.org/p/EP-waAPGB4

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.