0

I would like to use a regex to do the following in Python:

I am given a list of strings such as: 'abc01 - [def02] - ghi03 - jkl04'

Each string will have a different number of items. Some will have brackets around and some will not.

Can someone help me with a regex match that will consist solely of items not in brackets? Dashes and spaces would need to be removed. So for the example above the output would be: [abc01, ghi03, jkl04]

Thanks

4 Answers 4

9

Is regex really the best tool for the job?

>>> S = 'abc01 - [def02] - ghi03 - jkl04'
>>> [x for x in S.split(' - ') if not (x.startswith('[') or x.endswith(']'))]
['abc01', 'ghi03', 'jkl04']
Sign up to request clarification or add additional context in comments.

Comments

2
>>> a='abc01 - [def02] - ghi03 - jkl04'
>>> [ i for  i in a.split(" - ") if "[" not in i ]
['abc01', 'ghi03', 'jkl04']

Comments

0

The following regex will solve your problem:

\b(?<!\[)\w+

The Python code is then:

for match in re.finditer(r"\b(?<!\[)\w+", input_line):
    item = match.group()

Notes:

  • \b asserts that the item starts at a word break, not in the middle of an item
  • The negative lookbehind (?<!\[) asserts that the item wasn't preceded by a [
  • \w+ matches an item of at least one consecutive word character, as many as possible

Comments

0

From the above description you just need to use findall() to match any sequence of letters and numbers (using the short code \w to match letters and numbers below).

>>> import re
>>> re.findall(r'\w+', 'abc01 - [def02] - ghi03 - jkl04')
['abc01', 'def02', 'ghi03', 'jkl04']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.