Use free-spacing mode for non-trivial regexes!
When dealing with non-trivial regexes like this one, you can dramatically improve readability (and maintainability) by writing them in free-spacing format with lots of comments (and indentation for any nested parentheses). Here is your original regex in free spacing format with comments:
$re_orig = '/# Original regex with added comments.
(?P<name>.*) # $name:
[ ] # Space separates name from weight.
(?P<total_weight>\d+) # $total_weight:
(?P<total_weight_unit>.*) # $total_weight_unit:
[ ] # Space separates totalunits from .
\( # Literal parens enclosing portions data.
(?P<unitWeight>\d+) # $unitWeight:
(?P<unitWeight_unit>.*) # $unitWeight_unit:
[ ]x[ ] # "space-X-space" separates portions data.
(?P<portion_no>\d+) # $portion_no:
\) # Literal parens enclosing portions data.
/x';
Here is an improved version:
$re_improved = '/# Match Name, total weight, units and portions data.
^ # Anchor to start of string.
(?P<name>.*?) # $name:
[ ]+ # Space(s) separate name from weight.
(?P<total_weight> # $total_weight:
\d+ # Required integer portion.
(?:\.\d*)? # Optional fractional portion.
)
(?P<total_weight_unit> # $total_weight_unit:
.+? # Units consist of any chars.
)
[ ]+ # Space(s) separate total from portions.
\( # Literal parens enclosing portions data.
(?P<unitWeight> # $unitWeight:
\d+ # Required integer portion.
(?:\.\d*)? # Optional fractional portion.
)
(?P<unitWeight_unit> # $unitWeight_unit:
.+? # Units consist of any chars.
)
[ ]+x[ ]+ # "space-X-space" separates portions data.
(?P<portion_no> # $portion_no:
\d+ # Required integer portion.
(?:\.\d*)? # Optional fractional portion.
)
\) # Literal parens enclosing portions data.
$ # Anchor to end of string.
/xi';
Notes:
- The expressions for all the numerical quantities has been improved to allow an optional fractional portion.
- Added start and end of string anchors.
- Added
i ignorecase modifier in case the X in the portions data is uppercase.
I'm not sure how you are applying this regex, but this improved regex should solve your immediate problem.
Edit: 2011-10-09 11:17 MDT Changed expression for units to be more lax to allow for cases pointed out by Ilmari Karonen.