0

I'm trying to match all fractions or 'evs' and strings (string1, string2) the following string with regex. The strings may contain any number of white spaces ('String 1', 'The String 1', 'The String Number 1').

10/3 string1 evs string2 8/5 mon 19:45 string1 v string2 1/1 string1 v string2 1/1

The following regex works in Javascript but not in PHP. No errors are returned, just 0 results.

(\d{1,3}\/\d{1,3}|evs).*?(.+).*?(\d{1,3}\/\d{1,3}|evs).*?(.+).*?(\d{1,3}\/\d{1,3}|evs).*?(.+) v (.+).*?(\d{1,3}\/\d{1,3}|evs).*?(.+) v (.+).*?(\d{1,3}\/\d{1,3}|evs)

Here's the expected result, other than group 6 and 7 (ran using Javascript):

enter image description here

If I add a ? to the first (.+) so that it becomes (.+?), I get the desired result but with the first string not captured:

enter image description here

As soon as I remove the ? to capture the whole string, there are no results returned. Can somebody work out what's going on here?

17
  • 1
    You should avoid +? and *?. Can you explain what you would like to achieve, we may think of an improved pattern? Commented Mar 12, 2020 at 20:14
  • 1
    what are you trying to capture? Commented Mar 12, 2020 at 20:15
  • @ŁukaszNojek Sure. I'm aiming to capture the fractions and the strings (string1 and string2), ignoring everything else. The regex string needs to be able to be built up dynamically by the columns that are defined in a prior function (each column has the regex for the element in that column e.g. a fraction is (\d{1,3}\/\d{1,3}|evs)). Commented Mar 12, 2020 at 20:44
  • Does (\d{1,3}\/\d{1,3}|evs) not give you the expected matches? Commented Mar 12, 2020 at 20:45
  • Can you provide the correct expected result? I'm still not sure if you want to match evs, string1 etc., Commented Mar 12, 2020 at 20:52

1 Answer 1

1

In PCRE/PHP, you may use

$regex = '(\d{1,3}\/\d{1,3}|evs)\s+(\S+)\s+((?1))\s+(\S+)\s+((?1))\s+(.+?)\s+v\s+(\S+)\s+((?1))\s+(\S+)\s+v\s+(\S+)\s+((?1))';
if (preg_match_all($regex, $text, $matches)) {
    print_r($matches[0]);
}

See the regex demo

The point is that you can't over-use .*? / .+ in the middle of the pattern, that leads to catastrophic backtracking.

You need to use precise patterns to match whitespace, and non-whitespace fields, and only use .*? / .+? where the fields can contain any amount of whitespace and non-whitespace chars.

Details

  • (\d{1,3}\/\d{1,3}|evs) - Group 1 (its pattern can be later accessed using (?1) subroutine): one to three digits, / and then one to three digits, or evs
  • \s+(\S+)\s+ - 1+ whitespaces, Group 2 matching 1+ non-whitespace chars, 1+ whitespaces
  • ((?1)) - Group 3 that matches the same way Group 1 pattern does
  • \s+(\S+)\s+((?1))\s+ - 1+ whitespaces, Group 4 matching 1+ non-whitespaces, 1+ whitespaces, Group 5 with the Group 1 pattern, 1+ whitespaces
  • (.+?) - Group 6: matching any 1 or more char chars other than line break chars as few as possible
  • \s+v\s+ - v enclosed with 1+ whitespaces
  • (\S+) - Group 7: 1+ non-whitespaces
  • \s+((?1))\s+ - 1+ whitespaces, Group 8 with Group 1 pattern, 1+ whitespaces
  • (\S+) - Group 9: 1+ non-whitespaces
  • \s+v\s+ - v enclosed with 1+ whitespaces
  • (\S+)\s+((?1)) - Group 10: 1+ non-whitespaces, then 1+ whitespaces and Group 11 with Group 1 pattern.
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks. I've updated the question to add the constraint that the words can contain any number of whitespace.
@AlexGodbehere The answer is still the same. I do not know your log format. You must use precise patterns for each field.
Thanks for your help. I ended up using .+. (\d{1,3}\/\d{1,3}|evs) (.+) (\d{1,3}\/\d{1,3}|evs) ((?2)) (\d{1,3}\/\d{1,3}|evs) ((?2)) v ((?2)) (\d{1,3}\/\d{1,3}|evs) ((?2)) v ((?2)) (\d{1,3}\/\d{1,3}|evs)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.