1

I'm formatting the output of an SMTP server log for output on a secured website. I already formatted the IP addresses with and without added port numbers (123.123.123.123 and 123.123.123.123:456, /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\:\d{1,5}|\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|\d{1,3}\.\d{1,3}\.\d{1,3}/).

Now I need to format other numeric values, but not in combination with non-numeric characters like ID's and CRAM-MD5).

In the following example, I need to get the 100, 56, 0, but neither the 5 of CRAM-MD5, nor the 21 (or 21E9) of 21E9C126E0B80, aAnd I need the 0 after Client and the 2022070508301657009855590.

2022-07-05 12:00:00 New Client Rejected (192.241.222.210 [digitalocean.com] -> AbuseIPDB Score: 100)
2022-07-05 12:00:00 New Client Connected (137.184.30.176 [digitalocean.com] -> AbuseIPDB Score: 56)
2022-07-05 12:00:00 New Client Connected (192.168.10.12 [] -> AbuseIPDB Score: 0)
2022-07-05 12:00:00 250-AUTH LOGIN PLAIN CRAM-MD5
2022-07-05 12:00:00 250 2.0.0 Ok: queued as 21E9C126E0B80
2022-07-05 12:00:00 Client 0 from 192.168.10.12 Disconnecting
2022-07-05 12:00:00 Forward mail 2022070508301657009855590

I currently have the following regex, which gets me 100 only: / [^a-zA-Z\/\.>(]\d+[^a-zA-Z\/\.>)\-]/
Yes, I need a space in front and exclude the > to avoid formatting an already formatted string. And yes, there need to be some follow-up characters excluded.

Here is my code:

preg_match_all('/\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\:\d{1,5}|\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|\d{1,3}\.\d{1,3}\.\d{1,3}/', $sLog, $matches);
foreach ($matches[0] as $nr) {
$sLog = str_replace($nr, '<span class="number">' . $nr . '</span>', $sLog);
}

The test scenario is here: https://regex101.com/r/sbD10s/1.
The regex will be used inside preg_match_all().

Can anyone help me on finding the correct regex?

10
  • Maybe :\s+\K\d+(?=\)) will do? See demo. Commented Jul 5, 2022 at 11:09
  • Or, maybe parse the whole lines with something like ^(?<message>.*?)\s+\((?<ip>\d[\d.]*)\s+\[(?<host>[^][]*)]\s+->\s+(?<scorestring>.*?):\s+(?<scorevalue>\d+)\)? See regex101.com/r/8pRn7M/2 Commented Jul 5, 2022 at 11:12
  • Tried both, but they didn't find any of the numbers. Commented Jul 5, 2022 at 11:22
  • Can you provide a code demo? Mine works and extracts all details. Here is another (first) code demo. Commented Jul 5, 2022 at 11:28
  • I already have in the question. So for your code I need to update the programming on the preg_match_all() result. I'll test that and get back here. Commented Jul 5, 2022 at 11:43

1 Answer 1

1

Since you wrap your matches with other strings, you should use preg_replace directly.

To match the numbers after whitespaces that are not followed with a dot + another digits, you can use (?<=\h)\d+\b(?!\.\d) pattern.

The whole solution for the current problem will look like

$sLog = preg_replace('~\d{1,3}\.\d{1,3}\.\d{1,3}(?:\.\d{1,3}(?::\d{1,5})?)?~', '<span class="number">$0</span>', $sLog);
$sLog = preg_replace('~(?<=\h)\d+\b(?!\.\d)~', '<span>$0</span>', $sLog);

Please adjust the replacement pattern in the second preg_replace to your liking. If the replacements are identical to both, just merge the two patterns into a single one:

$sLog = preg_replace('~\d{1,3}\.\d{1,3}\.\d{1,3}(?:\.\d{1,3}(?::\d{1,5})?)?|(?<=\h)\d+\b(?!\.\d)~', '<span class="number">$0</span>', $sLog);

See the (?<=\h)\d+\b(?!\.\d) regex demo:

  • (?<=\h) - immediately to the left, there must be a horizontal whitespace
  • \d+ - one or more digits
  • \b - a word boundary
  • (?!\.\d) - immediately on the right, there must be no . and a digit.
Sign up to request clarification or add additional context in comments.

1 Comment

Perfekt solution, just added the class in the second preg_replace() ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.