0

I have a very long text with parts enclosed in +++ which I would like to enclose in square brackets

se1 = "+++TEXT:+++ Moshe Morgenstern is on his way to the main synagogue in the center of Bnei Brak, home to a largely ultra-orthodox - or haredi - community. +++ : Bnei Brak, Tel Aviv + Jerusalem ))+++"

I would like to convert text enclosed in +++ to [[]] so,

+++TEXT+++ should become [[TEXT]]

My code:

import re


se1 = "+++TEXT:+++ Moshe Morgenstern is on his way to the main synagogue in the center of Bnei Brak, home to a largely ultra-orthodox - or haredi - community. +++ Karte Israel mit: Bnei Brak, Tel Aviv + Jerusalem ))+++"

comments = re.sub(r"\+\+\+.*?\+\+\+", r"[[.*?]]", se1)
print(comments)

but it gives the wrong output

[[.*?]] Moshe Morgenstern is on his way to the main synagogue in the center of Bnei Brak, home to a largely ultra-orthodox - or haredi - community. [[.*?]]

2 Answers 2

1

You can use this:

re.sub(r'\+\+\+(.*?)\+\+\+',r'[[\1]]',se1)

As the .*? in the second string is seen as pure string instead of the replacement for the .*? in the match string, the (.*?) means to save this part to be used in the replacement string, and \1 is the data saved.

Sign up to request clarification or add additional context in comments.

Comments

1

You need to capture the group with () and then reference that matching group with \1

This should work fine:

>>> comments = re.sub(r"\+\+\+(.*?)\+\+\+", r"[[\1]]", se1)
>>> comments
'[[TEXT:]] Moshe Morgenstern is on his way to the main synagogue in the center of Bnei Brak, home to a largely ultra-orthodox - or haredi - community. [[ Karte Israel mit: Bnei Brak, Tel Aviv + Jerusalem ))]]'

Take into account that \+\+\+ can be simplified to \+{3} as well.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.