python regex parse string with brackets

Question

I would like to parse a string with or without brackets. Basically for john[doe], I would like to get two variables, basically outside the [] and inside the bracket. So for this example I would like to extract john and doe. The string will always have this structure. But another example can also be just john, means second variable is "" or None. How can I do this using the re library? Or just straight Python, if it's more efficient that regex?

This is what I tried so far:

s = sample_string.split("[")
x, y = (sample_string, None) if len(s) == 1 else (s[0], s[1][:-1])

I was originally using split with '[' then remove ']' from the second one, if there is a second element. No nested brackets. Just XXX[YYY] or XXXX format really — user1179317
– user1179317, Commented Sep 11, 2020 at 20:55
This can be done with regular expressions which is also efficient. — Michael Butscher
– Michael Butscher, Commented Sep 11, 2020 at 20:55

Booboo · Accepted Answer · 2020-09-11 21:56:41Z

4

A regex solution:

r'^([^[]+)(?:\[([^\]]+)])?$'

^ Matches start of string.
([^[]+) Capture group 1: matches 1 or more characters that are not '['.
(?: Start of non-capturing group.
\[ Matches '['.
([^\]]+) Capture group 2: matches 1 or more characters that are not ']'.
] Matches ']'
) End of non-capturing group.
'?' The non-capturing group is optional.

import re

tests = ['john', 'john[doe]']

for test in tests:
    m = re.match(r'^([^[]+)(?:\[([^\]]+)])?$', test)
    if m:
        print(test, '->', m[1], m[2])

Prints:

john -> john None
john[doe] -> john doe

Explanations

First, anything between parentheses ( ) is a capturing group. Anything between (?: ) is a non-capturing group. Either of these types of groups can contain capturing an non-capturing groups within. [] is used to define a set of characters. For example, [aqw] matches 'a', 'q' or 'w'. [a-e] matches 'a', 'b', 'c', 'd' or 'e'. [^aqw] with a leading ^ negates the set meaning it matches any character other than 'a', 'q', 'w'. So, [^\]] matches any character other than ']' (you have to put a \ character in front of the ] character to "escape" it because in that context ] has special meaning (it would otherwise close the [] construct). The following + sign denotes "one or more of what preceded this". So ([^[]+) matches one or more of nay character that is not a [.

I hope the preceding explanations help.

edited Sep 11, 2020 at 21:56

answered Sep 11, 2020 at 21:25

Booboo

45.7k4 gold badges46 silver badges74 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user1179317 Over a year ago

Sorry, can you break down steps 2 and 5? How those are capturing those groups

Booboo Over a year ago

I've added some explanations. I have to run now for a while. Let me know if you need more.

PApostol · Accepted Answer · 2020-09-11 21:20:08Z

0

Is it a requirement that you use regex for this? It's probably easier without:

if '[' in string:
  x, y = string.split('[')
  y = y.strip(']')
else:
  x, y = string, ''

Something to include regex might look like this:

if '[' in string:
  x, y = re.findall('^(.+)\[(.+)?]', string)[0]
else:
  x, y = string, ''

edited Sep 11, 2020 at 21:20

answered Sep 11, 2020 at 21:15

PApostol

2,3122 gold badges17 silver badges24 bronze badges

1 Comment

user1179317 Over a year ago

It is not a requirement to use regex, was just thinking that will be more efficient. Especially if its done many many times. I basically did something like that without looking for '[' and just splitting right away. I'll update my question with it

Shop_till_ I_drop · Accepted Answer · 2020-09-11 22:07:09Z

0

As long as john[doe] is a string type, you should be able to parse the phrase using the replace function:

import re

x = str('john[doe]')
new_x = x.replace("[", " ").replace("]", "")
print(new_x)

or, if you want to, you can use the match function:

import re

x = str('john[doe]')
m = re.match(r"(?P<first_name>\w+)\[(?P<last_name>\w+)\]", x)
name = m.group('first_name') + " " + m.group('last_name')
print(name)

Without having more phrases to parse, I am not sure which of the two is faster. Good luck! :)

answered Sep 11, 2020 at 22:07

Shop_till_ I_drop

1

Comments

pho · Accepted Answer · 2020-09-11 21:24:35Z

-1

There are probably better ways to do it but this worked for me.

s = "john[doe]"
arr = []
x = re.split("\[", s)[1]
arr.append(re.split("\[", s)[0])
arr.append(re.split("\]", x)[0])
print(arr)

edited Sep 11, 2020 at 21:24

pho

25.7k8 gold badges48 silver badges75 bronze badges

answered Sep 11, 2020 at 21:15

Jolly9642

334 bronze badges

Collectives™ on Stack Overflow

python regex parse string with brackets

4 Answers 4

2 Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related