1

I am using the following regex: (example here: https://regex101.com/r/dVTUrM/1)

\/(?<field1>.{4})\/(?<field2>.*?)\/(?<field3>.*?)\/(?<field4>.*?)\/(?<field5>.*?)\/(?<field6>.*)

to parse the following text:

pyramid:/A49E/18DA-6FAB-4921-8AEB-45A07B162DA5/{E3646FA1-4652-45E9-885A-3756FC574057}/{F1864679-1D9D-4084-B38D-231D793AA15D}/9/abc.tif

giving the following result:

Group `field1`  9-13    `A49E`
Group `field2`  14-46   `18DA-6FAB-4921-8AEB-45A07B162DA5`
Group `field3`  47-85   `{E3646FA1-4652-45E9-885A-3756FC574057}`
Group `field4`  86-124  `{F1864679-1D9D-4084-B38D-231D793AA15D}`
Group `field5`  125-126 `9`
Group `field6`  127-134 `abc.tif`

But if field5 and field 6 are missing:

pyramid:/A49E/18DA-6FAB-4921-8AEB-45A07B162DA5/{E3646FA1-4652-45E9-885A-3756FC574057}/{F1864679-1D9D-4084-B38D-231D793AA15D}

I would like this to work and for field5 and field6 to be blank.

Is this possible by modifying the regex statement?

Note: only field6 may be missing as well.

1
  • Can you put non-capturing parentheses around the optional bits, and a ? qualifier? That is, use …/(?:(?<field5>.*?)(?:\/(?<field6>.*))?)? at the end… Or some variant on this. You might need to review the greediness of field 6. Commented Mar 8, 2017 at 20:16

1 Answer 1

1

Here you go:

(?x)^pyramid:
/(?P<field1>[^/]{4})
/(?P<field2>[^/]+)
/(?P<field3>[^/]+)
/(?P<field4>[^/]+)
(?:
    /(?P<field5>[^/]+)
    /(?P<field6>[^/]+)
)?

See a demo on regex101.com.

Or, in short (without the verbose flag):

^pyramid:/(?P<field1>[^/]{4})/(?P<field2>[^/]+)/(?P<field3>[^/]+)/(?P<field4>[^/]+)(?:/(?P<field5>[^/]+)/(?P<field6>[^/]+))?

Depending on the programming language / flavour used, you might use other delimiters like ~ so that you don't need to escape the forward slashes anymore. The (?: ... ) construct is a non capturing group which is made optional with ? to allow 4 or 6 (but not five!) fields.

Sign up to request clarification or add additional context in comments.

5 Comments

Why not drop the x flag and use [^/] instead of [^/\r\n]?
@Jordan: You're absolutely right, just copied it from regex101.com and the newline was there to illustrate the different lines though they're not needed in the actual expression.
@Jordan: Updated to reflect your thoughts.
I updated it to this: regex101.com/r/y1kfox/2 as field6 can be missing or field5 and field6 can be missing... and also the word pyramid can be any string value.
Ok... not sure why I was having trouble... but this works with my modifications and yours. regex101.com/r/y1kfox/3

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.