0
public static String entryPattern = "^([\\d.]+) (\\S+) (.+?) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(.+?)\" (\\d{3}) (\\d+) \"([^\"]+)\" \"([^\"]+)\"";

    public static void parseTwigLine(String line) {
        Pattern p = Pattern.compile(entryPattern);
        Pattern p1;
        Matcher matcher = p.matcher(line);
        System.out.println(matcher.groupCount());
        if (!matcher.matches() || NUM_FIELDS != matcher.groupCount()) {
          System.err.println("Bad log entry (or problem with RE?):");
          System.err.println(line);
          return;
        }

        timeStamp = matcher.group(4);
        ipAddress = matcher.group(1);
        if (!matcher.group(3).equals("-")) {
        userName = matcher.group(3);
        }
        request = matcher.group(5);
        response = matcher.group(6);
        bytesSent = matcher.group(7);
        browser = matcher.group(9);

        if (!matcher.group(8).equals("-"))
         url = matcher.group(8);
        instanceName = url.split("/")[3];
        if(request.contains("?q")) {
            queryTerms = request.split("[?|&]")[1];
        } else if(url.contains("?q")) {
            queryTerms = url.split("[?|&]")[1].split("=")[1];
        }
        if(request.contains("&f")) {
            filters = request.split("&f=")[1];
        } else if(url.contains("&f")) {
            filters = request.split("&f=")[1];
        }

    }

For this below line my regular expression is not getting matched.. Any suggestions why is it happening. As I always get an error as Bad log entry (or problem with RE?) from my code above. Anything wrong with my regex

10.53.32.1 - - [14/Nov/2011:09:45:56 -0800] "GET /host-ui/themes/client/images/preview/left6_na.gif HTTP/1.1" 304 - "http://search.host.com/search-ui/?q=8960" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; MS-RTC LM 8; InfoPath.3; BOIE9;ENUS)"

And for this below line it is getting matched--

10.53.32.1 - - [14/Nov/2011:09:45:56 -0800] "GET /host-ui/themes/client/images/btn_close_include.png HTTP/1.1" 200 1023 "http://search.host.com/search-ui/?q=8960" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; MS-RTC LM 8; InfoPath.3; BOIE9;ENUS)"
3
  • For known-format messages, is a regex the best way to go? Seems like it might be easier to break it up knowing that the data is presented in a very consistent pattern, then if you need to, break up individual parts (like form parameters) using simpler regexes, splits, etc. Commented Nov 15, 2011 at 1:46
  • @Dave Newton, which method is best. Using Regex or just by splitting the string.. Commented Nov 15, 2011 at 3:10
  • Don't know; if speed isn't an issue, it probably doesn't matter. Commented Nov 15, 2011 at 3:13

1 Answer 1

1

The \d+ doesn't match a -, replace it with something that does. Example:

Original: "^([\\d.]+) (\\S+) (.+?) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(.+?)\" (\\d{3}) (\\d+) \"([^\"]+)\" \"([^\"]+)\""
Fixed:    "^([\\d.]+) (\\S+) (.+?) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(.+?)\" (\\d{3}) (\\S+) \"([^\"]+)\" \"([^\"]+)\""
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.