You are trying to solve 2 problems:
- Parse the input to extract "meaningful" data, i.e. geometry (rectangular, round, etc) and parameters (aspect ratio, diameter, etc). Before you can do that you must establish the "universe" of possibilities. Are there more than just rectangular and round? This is the harder part.
- take the extracted data and normalize/standardize the format. This is the easy part
Let's say you only have two options, rectangular and round. Rectangular seems to be defined by a pair of real numbers separated by an 'x', so a regex for that might be
(\d+(?:\.\d+)?)\s*x\s*(\d+(?:\.\d+)?)
What you have here is two expressions for real numbers:
- 1 or more digits followed by an optional group of a dot and one or more digits
- optional whitespace, an
x and more optional whitespace
- 1 or more digits followed by an optional group of a dot and one or more digits
The outer parentheses around the number expression is a capturing group that causes the regex engine to make whatever matched available in the results. The inner parentheses (?:\.\d+)? is a non-capturing group (the ?: part). It allows you to apply the trailing ? quantifier (0 or 1) to the decimal portion but not capture it separately.
If the input doesn't match this, you move on to the next pattern, looking for a round specification. Repeat as needed for all possibilities.
For the above expression
# assume string to be parsed is in $_
if (my ($h,$w) = /(\d+(?:\.\d+)?)\s*x\s*(\d+(?:\.\d+)?)/))
{
printf "%s x %s\n", $h, $w;
}
I haven't tested this so there may be a typo... but this is the general idea.