0

I'm looking for kind of solution for parsing the Varnish log file. It looks like:

178.232.38.87 - - [23/May/2012:14:01:05 +0200] "GET http://static.vg.no/iphone/js/front-min.js?20120509-1 HTTP/1.1" 200 2013 "http://touch.vg.no/" "Mozilla/5.0 (Linux; U; Android 2.3.3; en-no; HTC Nexus One Build/GRI40) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1"

There can be distinguished following elements:

%h %l %u %t "%r" %s %b "%{Referer}i" "%{User-agent}i"

but I still have no idea how to do this. Simple String.split(" "); won't work.

I know regular expressions has general rules, but the most suitable would be java one.

Thanks

1
  • possible duplicate of java parse log file Commented Jan 11, 2015 at 12:07

1 Answer 1

2

I'd come up with a way to build a regular expression from chunks matching the individual fields according to their possible/expected values.

    String rexa = "(\\d+(?:\\.\\d+){3})";  // an IP address
    String rexs = "(\\S+)";                // a single token (no spaces)
    String rexdt = "\\[([^\\]]+)\\]";      // something between [ and ]
    String rexstr = "\"([^\"]*?)\"";       // a quoted string
    String rexi = "(\\d+)";                // unsigned integer

    String rex = String.join( " ", rexa, rexs, rexs, rexdt, rexstr,
                              rexi, rexi, rexstr, rexstr );

    Pattern pat = Pattern.compile( rex );
    Matcher mat = pat.matcher( h );
    if( mat.matches() ){
        for( int ig = 1; ig <= mat.groupCount(); ig++ ){
            System.out.println( mat.group( ig ) );
        }
    }

It is, of course, possible to make do with rexs in place of rexa or rexi.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.