3

I'm attempting to parse through a log file containing numerous traces, some of which have multiple lines to them.

Example:

[trace-123] <request>This is a log line</request>
[trace-124] <reply>This is another log line

this is part of "[trace-124]" still.</reply>
[trace-125] <request>final log line.</request>

I'm attempting to use preg_match_all to get an array of all the traces.

$file = file_get_contents("traces.txt");
$tracePattern = "/(\[trace-[0-9]*+\]+[\s\S]*)(?<=\<\/reply>|\<\/request>)/";

preg_match_all($tracePattern,$file,$lines);

echo "<pre>";print_r($lines);echo "</pre>";

Ideally, I'd like my results to look like this:

Array
(
    [0] => [trace-123] <request>This is a log line</request>
    [1] => [trace-124] <reply>This is another log line

this is part of "[trace-124]" still.</reply>
    [2] => [trace-125] <request>final log line.</request>
)

but when I run it, I get an array with everything in 1 element of the array. When I wrote the expression, my goal was to basically look for:

[trace-\[0-9]*\] 

and find everything from that match to the next match of it.

I found that

\[trace-[0-9]*+\].* 

works pretty well, but breaks down when there are line breaks.

6
  • 1
    I fail to see why what you are doing would require regex at all. You can simply loop through each line of the file looking for [trace- at the beginning of the string. Each time you encounter this value, start adding the lines to a string at the next array position in teh array you are building. You stop adding to this string the next time you encounter either another line beginning with [trace- or the next time you encounter a line beginning with some other non-trace signature (like for instance if you have lines like [error- or whatever). Commented Nov 14, 2013 at 20:35
  • 1
    To contiune my comment... requiring multi-line regex is likely going to require you put put your ENTIRE log file into memory (either that or at least store portions of it into memory until next [trace- is seen, which basically requires the implementation I suggested above to be to execute). This may not be feasible for larger log files. You likely should focus on a solution that allows you to parse and work with a single line at a time. Commented Nov 14, 2013 at 20:37
  • 1
    @Mike - With multi-gig ram capacity, if a file is too big. reading line by line might take a protracted period of time and building an array would have to be offloaded as well. Another approach would be read in 10,000 lines at a time, process records with a multi-line regex, capture the last record start, put at front of buffer, read another 10,000 lines (or, like 10 megs), repeat. Commented Nov 14, 2013 at 22:12
  • @sin True, but oftentimes you want a very process that is very light on RAM, as the servers may be taking production load. In these cases, you cannot afford to dedicate large chunks of RAM to a single process. You are right that building the array might be too expensive on RAM as well depending on number of overall lines where this trace data is present. Commented Nov 14, 2013 at 23:43
  • @Mike - Yeah thats true, but in a server environment each virtual machine might be given a gig or two virtual ram. But thats a lot of disk seeks doing it line by line on partitioned drives. Reading in 10 meg chunks and using a multi-line regex is a good balance when it comes to resources. Flat out, its at least 10 or more times faster. Commented Nov 15, 2013 at 0:43

7 Answers 7

3

The following would probably be a better approach here.

$results = preg_split('/\R(?=\[trace[^\]]*\])/', $text);
print_r($results);

See working demo

Output

Array
(
    [0] => [trace-123] <request>This is a log line</request>
    [1] => [trace-124] <reply>This is another log line

this is part of "[trace-124]" still.</reply>
    [2] => [trace-125] <request>final log line.</request>
)
Sign up to request clarification or add additional context in comments.

Comments

2

Use this:

$file = '[trace-123] <request>This is a log line</request>
[trace-124] <reply>This is another log line

this is part of "[trace-124]" still.</reply>
[trace-125] <request>final log line.</request>';

$tracePattern = "/\[trace-[0-9]*+\]+\s*<(?:reply|request)>.*?<\/(?:reply|request)>/s";

preg_match_all($tracePattern,$file,$lines);

$lines = $lines[0]; // by defaults, $lines[0] will be an array of the matches, so get that

echo "<pre>";print_r($lines);echo "</pre>";

Working demo: http://ideone.com/n8n5r3

Comments

2

This works in MULTI_LINE mode. Trims leading spaces and trailing newline's.

Edit: This assumes an anchor that is [trace- ] and is either at the beginning of
the line or beginning plus non-newline whitespace until 'trace'. This is the
only discernable record separator.

 #  ^[^\S\n]*(\[trace-[^]]*\][^\n]*(?:(?!\s+\[trace-[^]]*\])\n[^\n]*)*)

 ^ [^\S\n]* 
 (
      \[trace- [^]]* \] [^\n]* 

      (?:
           (?! \s+ \[trace- [^]]* \] )
           \n [^\n]* 
      )*
 )

Output (in single quotes)

 '[trace-123] <request>This is a log line</request>'
 '[trace-124] <reply>This is another log line

 this is part of "[trace-124]" still.</reply>'
 '[trace-125] <request>final log line.</request>'

Comments

2

I'd recommend a solution via preg_split

preg_split('/\R+(?=\[trace-\d+])/', $str)

this results in the following

Array
(
    [0] => [trace-123] <request>This is a log line</request>
    [1] => [trace-124] <reply>This is another log line

this is part of "[trace-124]" still.</reply>
    [2] => [trace-125] <request>final log line.</request>
)

Comments

0

The symbol . means every char except line breaks \n, you can try to change it with (.|\s) this way :

#\[trace-[0-9]*+\](.|\s)*#

Note : you can use non-capturant parenthesis (?: )

Easyer, add the flag "s"

#\[trace-[0-9]*+\].*#s

1 Comment

No need, he can add pattern modifier s (PCRE_DOTALL).
0

You should use a reluctant quantifier (??, +? or *?).

I believe this regex /(\[trace-[0-9]*\]\s*(?m:.*?)<\/(?:reply|request)>)/ should do it... the (?m:.*?) part is the secret. :)

Comments

0

This should do with the flag s on:

(\[trace-[0-9]+\].*?<\/(?:reply|request)>)

Live DEMO

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.