0

I cant seem to get regex to work with the following example. Basically I would like to parse 4 groups from a string such as below:

test.this
test[extra].this
test[extra].this{data}
test.this{data}

I would like to get the answer as such, for the examples above respectively:

val1='test', val2=None, val3='this', val4=None
val1='test', val2='extra', val3='this', val4=None
val1='test', val2='extra', val3='this', val4='data'
val1='test', val2=None, val3='this', val4='data'

I tried this but it's not working:

import re

tests = ["test.this",
         "test[extra].this",
         "test[extra].this{data}",
         "test.this{data}",]

for test in tests:
    m = re.match(r'^([^\[\.]+)(?:\[([^\]]+)])(?:\.([^{]+){)([^}]+)?$', test)
    if m:
        print(test, '->', m[1], m[2], m[3], m[4])

2 Answers 2

2

If only the second and fourth groups are optional, you may use:

^([^\[\.]+)(?:\[([^\]]+)])?\.([^{\r\n]+)(?:{([^}\r\n]+)})?$

Demo.

Note that \r and \n were added in the negated character classes of the third and fourth groups to avoid going beyond the end of the line. If you're only using single-line strings, that won't be necessary.

Sign up to request clarification or add additional context in comments.

1 Comment

Yes this works, but yea I think you meant first and third is required, not optional. But yea this answer works. Will review your answer on how it works. Thanks!
1
import re

tests = ["test.this",
     "test[extra].this",
     "test[extra].this{data}",
     "test.this{data}"]

pat = re.compile(r'(\w+)([\[])?(\w+)?([\]])?\.(\w+){?(\w+)?')
for test in tests:
    x = pat.search(test)
    print(x.group(1),x.group(3),x.group(5),x.group(6))

(\w+) -> captures test

([\[])? -> captures [

(\w+)? -> captures extra

([\]])? -> captures ]

(\w+) -> captures this

(\w+)? -> captures data

2 Comments

Question, how is '(\w+)?' not capturing '}' with data? I mean I know your solution works, just trying to understand how '}' is being ignored there
contents inside the () braces are only captured in the grouping. since \w looks for only alphabets, digits, and underscores. Hence '}' is ignored.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.