2

I have a String read from a file via apache commons FileUtils.readFileToString, which has the following format:

<!--LOGHEADER[START]/-->
<!--HELP[Manual modification of the header may cause parsing problem!]/-->
<!--LOGGINGVERSION[2.0.7.1006]/-->
<!--NAME[./log/defaultTrace_00.trc]/-->
<!--PATTERN[defaultTrace_00.trc]/-->
<!--FORMATTER[com.sap.tc.logging.ListFormatter]/-->
<!--ENCODING[UTF8]/-->
<!--FILESET[0, 20, 10485760]/-->
<!--PREVIOUSFILE[defaultTrace_00.19.trc]/-->
<!--NEXTFILE[defaultTrace_00.1.trc]/-->
<!--ENGINEVERSION[7.31.3301.368426.20141205114648]/-->
<!--LOGHEADER[END]/-->
#2.0#2015 03 04 11:04:19:687#+0100#Debug#...(few lines to follow)

I am trying to filter out everything between the LOGHEADER[START] and LOGHEADER[END] line. Therefore I created a java regex:

String fileContent = FileUtils.readFileToString(file);
String logheader = "LOGHEADER\\[START\\].*LOGHEADER\\[END\\]";
Pattern p = Pattern.compile(logheader, Pattern.DOTALL);
Matcher m = p.matcher(fileContent);
System.out.println(m.matches());

(Dotall since it is a Multiline pattern and i want to cover linebreaks as well) However this pattern does not match the String. If I try to remove the LOGHEADER\[END\] part of the regex I get a match, that contains the whole String. I don't get why it is not matching for the original RegEx.

Any help is appreciated - thanks a lot!

4
  • 1
    Regexes are probably not the best tool for the job here. Since you know the frontiers, just use a BufferedReader and use .readLine(). Commented Mar 17, 2015 at 11:48
  • What's your expected output? Commented Mar 17, 2015 at 11:55
  • Or if you want to rely on RegEx, try your RegEx with one of this tools Commented Mar 17, 2015 at 11:55
  • 1
    Did you read what matches() does? Are you sure you want to use this method? Maybe you are looking for find(). Also .* should probably be reluctant. Commented Mar 17, 2015 at 11:56

2 Answers 2

1

The important thing to remember about this Java matches() method is that your regular expression must match the entire line.

So, you have to use find() this way to capture all in-between <!--LOGHEADER[START]/--> and n<!--LOGHEADER[END]/--:

String logheader = "(?<=LOGHEADER\\[START\\]/-->).*(?=<!--LOGHEADER\\[END\\])";
        Pattern p = Pattern.compile(logheader, Pattern.DOTALL);
        Matcher m = p.matcher(fileContent);
        while(m.find()) {
         System.out.println(m.group());
       }

Or, to follow the logics you suggest (just using matches), we need to add ^.* and .*$:

String logheader = "^.*LOGHEADER\\[START\\].*LOGHEADER\\[END\\].*$";
Pattern p = Pattern.compile(logheader, Pattern.DOTALL);
Matcher m = p.matcher(fileContent);
System.out.println(m.matches());
Sign up to request clarification or add additional context in comments.

Comments

0

You actually need to use Pattern and Matcher classes along with find method. The below regex will fetch all the lines which exists between LOGHEADER[START] and LOGHEADER[END].

String s = "<!--LOGHEADER[START]/-->\n" + 
        "<!--HELP[Manual modification of the header may cause parsing problem!]/-->\n" + 
        "<!--LOGGINGVERSION[2.0.7.1006]/-->\n" + 
        "<!--NAME[./log/defaultTrace_00.trc]/-->\n" + 
        "<!--PATTERN[defaultTrace_00.trc]/-->\n" + 
        "<!--FORMATTER[com.sap.tc.logging.ListFormatter]/-->\n" + 
        "<!--ENCODING[UTF8]/-->\n" + 
        "<!--FILESET[0, 20, 10485760]/-->\n" + 
        "<!--PREVIOUSFILE[defaultTrace_00.19.trc]/-->\n" + 
        "<!--NEXTFILE[defaultTrace_00.1.trc]/-->\n" + 
        "<!--ENGINEVERSION[7.31.3301.368426.20141205114648]/-->\n" + 
        "<!--LOGHEADER[END]/-->\n" + 
        "#2.0#2015 03 04 11:04:19:687#+0100#Debug#...(few lines to follow)";
Matcher m = Pattern.compile("(?s)\\bLOGHEADER\\[START\\][^\\n]*\\n(.*?)\\n[^\\n]*\\bLOGHEADER\\[END\\]").matcher(s);
while(m.find())
{

System.out.println(m.group(1));

}

Output:

<!--HELP[Manual modification of the header may cause parsing problem!]/-->
<!--LOGGINGVERSION[2.0.7.1006]/-->
<!--NAME[./log/defaultTrace_00.trc]/-->
<!--PATTERN[defaultTrace_00.trc]/-->
<!--FORMATTER[com.sap.tc.logging.ListFormatter]/-->
<!--ENCODING[UTF8]/-->
<!--FILESET[0, 20, 10485760]/-->
<!--PREVIOUSFILE[defaultTrace_00.19.trc]/-->
<!--NEXTFILE[defaultTrace_00.1.trc]/-->
<!--ENGINEVERSION[7.31.3301.368426.20141205114648]/-->

If you do want to match also the LOGHEADER lines, then a capturing group would be an unnecessary one.

Matcher m = Pattern.compile("(?s)[^\\n]*\\bLOGHEADER\\[START\\].*?\\bLOGHEADER\\[END\\][^\\n]*").matcher(s);
while(m.find())
{

System.out.println(m.group());

}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.