I am trying to parse and tokenize recipes. Ingredients can be written in a 2 main ways:
Style 1
1 Ripe Avocado
1x Ripe Avocado - x is optional and sometimes present
OR:
Style 2
1 Ripe Avocado (lrg) 123
1x Ripe Avocado (lrg) 123 - if the abbreviation present so is an item code integer
I am trying to a) detect if it is a match for Style 1 or 2 and b) tokenize into the following capture-groups.
[1][Ripe Avocado][lrg]?[123]?
I can't seem to consistently parse this, so any help would be much appreciated!
Edit:
^(\d+)x? ([a-zA-Z0-9_', -]+) is what I had but it didn't account for the optional capture groups in Style 2.
[]in your regex? That's for defining character classes, not groups.(?:)would be a non-capturing group, as opposed to an optional group.