I am scraping information from a log that I need 3 elements. Another added difficulty is that I am parsing the log via readLine() in my java program aka one(1) line at a time. (If there is a possibility to read multiple lines when parsing let me know :) ) NOTE: I have no control over the log output format.
There are 2 possibilities of what I must extract. Either the log is nice and gives the following
NICE FORMAT
.text.rank 0x0000000000400b8f 0x351 is_x86.o
where I must grab .text.rank , 0x0000000000400b8f , and 0x351
Now the not so nice case: If the name is too long, it bumps everything else to the next line like is below, now the only thing after the first element is one blank space followed by a newline (\n) which gets clobbered by readLine() anyway.
EVIL FORMAT : Note each line is in a separate arraylist entry.
.text.__sfmoreglue
0x0000000000401d00 0x55 /mnt/drv2homelibc_popcorn.a(lib_a-findfp.o)
Therefore what the regex actually sees is:
.text.__sfmoreglue
CORNER CASE FORMAT that also occurs within the log but I DO NOT want
*(.text.unlikely)
Finally below is my Pattern line I am currently using for the first line and pline2 is what is used on the next line when group 2 of the first line is empty.
UPDATE: The pattern below works for the NICE FORMAT and EVIL FORMAT But now pattern pline2 has no matches, even though on regex101.com it is correct. Link: https://regex101.com/r/vS7vZ3/9
UPDATE2: I fixed it, I forgot to add m2.find() once I compiled the second line with Pattern pline2. Corrected code is below.
Pattern p = Pattern.compile("^[ \\s](\\.[tex]*\\.[\\._\\-\\@a-zA-Z0-9]*)\\s*([x0-9a-f]*)[ \\s]*([x0-9a-f]*).*");
Pattern pline2 = Pattern.compile("^\\s*([x0-9a-f]*)[ \\s]*([x0-9a-f]*)\\s*[\\w\\(\\)\\.\\-]*");
To give a little background I am first matching the name .text.whatever to m.group(1) followed by the address 0x000012345 to m.group(2) and finally the size 0xa48 to m.group(3). This is all assuming the log is in the NICE format. If it is in the EVIL format I see that group(2) is empty and therefore readin the next line of the log to a temp buffer and apply the second pattern pline2 to new line.
Can someone help me with the regex? Is there a way I can make sure my current line (or even better, just the second grouping) is either the NICE FORMAT or is empty?
As requested my java code:
//1st line pattern
Pattern p = Pattern.compile("^[ \\s](\\.[tex]*\\.[\\._\\-\\@a-zA-Z0-9]*)\\s*([x0-9a-f]*)[ \\s]*([x0-9a-f]*).*");
//conditional 2nd line pattern
Pattern pline2 = Pattern.compile("^\\s*([x0-9a-f]*)[ \\s]*([x0-9a-f]*)\\s*[\\w\\(\\)\\.\\-]*");
while((temp = br1.readLine()) != null){
Matcher m = p.matcher(temp);
while(m.find()){
System.out.println("What regex finds: m1:"+m.group(1)+"# m2:"+m.group(2)+"# m3:"+m.group(3));
if(!m.group(1).isEmpty() && m.group(2).isEmpty() && m.group(3).isEmpty()){
//means we probably hit a long symbol name and important stuff is on the next line
//save the name at least
name = m.group(1);
//read and utilize the next line
if((temp = br1.readLine()) == null){
return;
}
System.out.println("EVILline2:"+temp); //sanity check the input
System.out.println(pline2.toString()); //sanity check the regex
Matcher m2= pline2.matcher(temp);
while(m2.find()){
System.out.println("regex line2 finds: m1:"+m2.group(1));//+"# m2:"+m2.group(2));
if(m2.group(2).isEmpty()){
size = 0;
}else{
size = Long.parseLong(m2.group(2).replaceFirst("0x", ""),16);
}
addr = Long.parseLong(m2.group(1).replaceFirst("0x", ""),16);
System.out.println("#########LONG NAME: "+name+" addr:"+addr+" size:"+size);
}
}//end if
else{ // assume in NICE FORMAT
//do nice format stuff.
}//end while
}//end outerwhile
An Aside, The output I currently get:
line: .text.c_print_results
What regex finds: m1:.text.c_print_results# m2:# m3:
EVIL FORMATline2: 0x00000000004001e6 0x231 c_print_results_x86.o
^\s*([x0-9a-f]*)[ \s]*([x0-9a-f]*)\s*[\w\(\)\.\-]*
Exception in thread "main" java.lang.IllegalStateException: No match found
at java.util.regex.Matcher.group(Matcher.java:536)
at java.util.regex.Matcher.group(Matcher.java:496)
at regexTest.regex.grabSymbolsInRange(regex.java:143)
at regexTest.regex.main(regex.java:489)