0

In PHP I try to make a regex to split a string in different parts as array elements.

For example this are my strings :

$string1 = "For a serving of 100 g Sugars: 2.3 g (Approximately)";
$string2 = "For a serving of 100 g Saturated Fat: 5.8 g (Approximately)";
$string3 = "For a portion of 100 g Energy Value: 290 kcal (Approximately)";

And I want to extract specific informations from these strings :

$arrayString1 = array('100 g','Sugars', '2.3 g');
$arrayString2 = array('100 g','Saturated Fat', '5.8 g');
$arrayString3 = array('100 g','Energy Value', '290 kcal');

I made this regex :

(^For a serving of )([\d g]*)([^:]*)(: )([\d.\d]*)( )([a-z]*)

Do you have any idea how to optimize this regex?

Thanks

1 Answer 1

2

You could make it a bit more specific matching the g or kcal and the digits.

To match all examples, you can use an alternation to match either of the alternatives (?:serving|portion)

Instead of using 7 capturing groups, you can use 3 capturing groups.

You can omit the first capturing group (^For a serving of )and combine the values of the digits and the unit.

^For\h+a\h+(?:serving|portion)\h+of\h+(\d+\h+g)\h+([^:\r\n]+):\h+(\d+(?:\.\d+)? (?:g|kcal))\b
  • ^ Start of string
  • For\h+a\h+(?:serving|portion)\h+of\h+ Match the beginning of the string with either serving or portion
  • (\d+\h+g)\h+ Capture group 1, match 1+ digits and g
  • ([^:\r\n]+):\h+ Capture group 2, match 1+ times any char except :, followed by matching : and 1+ horizontal whitspace chars
  • ( Capture group 3
    • \d+(?:\.\d+)? Match 1+ digits with an optional decimal part
    • \h+(?:g|kcal) Match 1+ horizontal whitespace chars and either g or kcal
  • )\b Close group 3 and a word boundary to prevent the word being part of a longer word

Regex demo | Php demo

For example

$pattern = "~^For\h+a\h+(?:serving|portion)\h+of\h+(\d+\h+g)\h+([^:\r\n]+):\h+(\d+(?:\.\d+)?\h+(?:g|kcal))\b~";
$strings = [
    "For a serving of 100 g Sugars: 2.3 g (Approximately)",
    "For a serving of 100 g Saturated Fat: 5.8 g (Approximately)",
    "For a portion of 100 g Energy Value: 290 kcal (Approximately)"
];

foreach ($strings as $string) {
    preg_match($pattern, $string, $matches);
    array_shift($matches);
    print_r($matches);
}

Output

Array
(
    [0] => 100 g
    [1] => Sugars
    [2] => 2.3 g
)
Array
(
    [0] => 100 g
    [1] => Saturated Fat
    [2] => 5.8 g
)
Array
(
    [0] => 100 g
    [1] => Energy Value
    [2] => 290 kcal
)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.