1

I have a small issue i am running into. I need a regular expression that would split a passed string with numbers separately and anything chunk of characters within square brackets separately and regular set of string separately.

for example if I have a strings that resembles

s = 2[abc]3[cd]ef 

i need a list with lst = ['2','abc','3','cd','ef']

I have a code so far that has this..

import re
s = "2[abc]3[cd]ef"
s_final = ""
res = re.findall("(\d+)\[([^[\]]*)\]", s)
print(res)

This is outputting a list of tuples that looks like this.

[('2', 'abc'), ('3', 'cd')]

I am very new to regular expression and learning.. Sorry if this is an easy one.

Thanks!

1
  • 3
    re.findall(r'\w+', s)? Or re.findall(r'\d+|[^\W\d_]+', s)? Commented Jul 4, 2021 at 17:03

3 Answers 3

1

The immediate fix is getting rid of the capturing groups and using alternation to match either digits or chars other than square bracket chars:

import re
s = "2[abc]3[cd]ef"
res = re.findall(r"\d+|[^][]+", s)
print(res)
# => ['2', 'abc', '3', 'cd', 'ef']

See the regex demo and the Python demo. Details:

  • \d+ - one or more digits
  • | - or
  • [^][]+ - one or more chars other than [ and ]

Other solutions that might help are:

re.findall(r'\w+', s)
re.findall(r'\d+|[^\W\d_]+', s)

where \w+ matches one or more letters, digits, underscores and some more connector punctuation with diacritics and [^\W\d_]+ matches any one or more Unicode letters.

See this Python demo.

Sign up to request clarification or add additional context in comments.

1 Comment

re.findall(r'\d+|[^\W\d_]+', s) this particular case works like a charm! Thanks!
0

Don't try a regex that will find all part in the string, but rather a regex that is able to match each block, and \w (meaning [a-zA-Z0-9_]) feats well

s = "2[abc]3[cd]ef"
print(re.findall(r"\w+", s))  # ['2', 'abc', '3', 'cd', 'ef']

Or split on brackets

print(re.split(r"[\[\]]", s))  # ['2', 'abc', '3', 'cd', 'ef ']

1 Comment

I think like Jonathan is pointing out the string is irregular. If I introduce a string in front of 2, all the solution breaks. for example if my string is s = "sm2[abc]3[cd]ef" the output i get is ['sm2', 'abc', '3', 'cd', 'ef']. Instead it should be ['sm', 2, 'abc', '3', 'cd', 'ef'].
0

Regex is intended to be used as a Regular Expression, your string is Irregular. regex is being mostly used to find a specific pattern in a long text, text validation, extract things from text.

for example, in order to find a phone number in a string, I would use RegEx, but when I want to build a calculator and I need to extract operators/digits I would not, but I would rather want to write a python code to do that.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.