How can I remove the closing square bracket using regex in Python?

Question

I have a messy list of strings (list_strings), where I am able to remove using regex the unwanted characters, but I am struggling to also remove the closing bracket ] . How can I also remove those ? I guess I am very close...

#the list to clean
list_strings = ['[ABC1: text1]', '[[DC: this is a text]]', '[ABC-O: potatoes]', '[[C-DF: hello]]']

#remove from [ up to : 
for string in list_strings:
  cleaned = re.sub(r'[\[A-Z\d\-]+:\s*', '', string)
  print(cleaned)

# current output

>>>text1]
>>>this is a text]]
>>>potatoes]
>>>hello]

Desired output:

text1
this is a text
potatoes
hello

RavinderSingh13 · Accepted Answer · 2021-04-23 08:33:55Z

4

Have your code this way. Fixing OP's attempt itself here. Your regex is doing all the thing only point is just add an OR condition where we could mention to substitute 1 or more occurrences of ] too.

import re
list_strings = ['[ABC1: text1]', '[[DC: this is a text]]', '[ABC-O: potatoes]', '[[C-DF: hello]]']
for string in list_strings:
  cleaned = re.sub(r'[\[A-Z\d\-]+:\s+|\]+$', '', string)
  print(cleaned)

answered Apr 23, 2021 at 8:33

RavinderSingh13

135k14 gold badges61 silver badges100 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

JvdV · Accepted Answer · 2021-04-23 08:38:08Z

3

I'd go with a different approach to regex using rstrip() and split() functionality:

list_strings = ['[ABC1: text1]', '[[DC: this is a text]]', '[ABC-O: potatoes]', '[[C-DF: hello]]']

cleaned = [s.split(': ')[1].rstrip(']') for s in list_strings]
print(cleaned) # ['text1', 'this is a text', 'potatoes', 'hello']

answered Apr 23, 2021 at 8:38

JvdV

76.8k8 gold badges48 silver badges89 bronze badges

Comments

Tim Biegeleisen · Accepted Answer · 2021-04-23 10:19:33Z

3

I would use a list comprehension here:

list_strings = ['[ABC1: text1]', '[[DC: this is a text]]', '[ABC-O: potatoes]', '[[C-DF: hello]]']
cleaned = [x.split(':')[1].strip().replace(']', '') for x in list_strings]
print(cleaned)  # ['text1', 'this is a text', 'potatoes', 'hello']

edited Apr 23, 2021 at 10:19

answered Apr 23, 2021 at 8:35

Tim Biegeleisen

526k32 gold badges324 silver badges399 bronze badges

Comments

Wiktor Stribiżew · Accepted Answer · 2021-04-23 08:41:03Z

You can use

cleaned = re.sub(r'^\[+[A-Z\d-]+:\s*|]+$', '', string)

See the Python demo and the regex demo.

Alternatively, to make sure the string starts with [[word: and ends with ]s, you may use

cleaned = re.sub(r'^\[+[A-Z\d-]+:\s*(.*?)\s*]+$', r'\1', string)

See this regex demo and this Python demo.

And, in case you simply want to extract that text inside, you may use

# First match only
m = re.search(r'\[+[A-Z\d-]+:\s*(.*?)\s*]', string)
if m:
    print(m.group(1))

# All matches
matches = re.findall(r'\[+[A-Z\d-]+:\s*(.*?)\s*]', string)

See this regex demo and this Python demo.

Details

^ - start of string
\[+ - one or more [ chars
[A-Z\d-]+ - one or more uppercase ASCII letters, digits or - chars
: - a colon
\s* - zero or more whitespaces
| - or
]+$ - one or more ] chars at the end of string.

Also, (.*?) is a capturing group with ID 1 that matches any zero or more chars other than line break chars, as few as possible. \1 in the replacement refers to the value stored in this group memory buffer.

Collectives™ on Stack Overflow

How can I remove the closing square bracket using regex in Python?

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related