1

I have text in following format.

|start| this is first para to remove |end|.
this is another text.
|start| this is another para to remove |end|. Again some free text

I want to remove all text in between |start| and |end|

I have tried following re.

regex = '(?<=\|start\|).+(?=\|end\|)'
re.sub(regex, ''. text)

It returns

"Again some free text"

But I expect to return

this is another text. Again some free text

5
  • 1
    stackoverflow.com/questions/3075130/… Commented Oct 30, 2019 at 11:17
  • 2
    Note the start/end delimiters are in lookaround constructs in your pattern and thus will remain in the resulting string after re.sub. Try r'(?s)\|start\|.*?\|end\|\W*'. Do you also need to remove all newlines? Then, you need to add .replace('\n', '') Commented Oct 30, 2019 at 11:17
  • The regex specified in @WiktorStribiżew comment should work just fine. I've just tested it in regex101.com . Commented Oct 30, 2019 at 11:19
  • @WiktorStribiżew Thank you. No I do not need to remove newlines. Commented Oct 30, 2019 at 11:47
  • 1
    @Hima Ok, but probably you would still need to strip() the result. Commented Oct 30, 2019 at 11:51

2 Answers 2

2

Note the start/end delimiters are in lookaround constructs in your pattern and thus will remain in the resulting string after re.sub. You should convert the lookbehind and lookahead into consuming patterns.

Also, you seem to want to remove special chars after the right hand delimiter, so you need to add [^\w\s]* at the end of the regex.

You may use

import re
text = """|start| this is first para to remove |end|.
this is another text.
|start| this is another para to remove |end|. Again some free text"""
print( re.sub(r'(?s)\|start\|.*?\|end\|[^\w\s]*', '', text).replace('\n', '') )
# => this is another text. Again some free text

See the Python demo.

Regex details

  • (?s) - inline DOTALL modifier
  • \|start\| - |start| text
  • .*? - any 0+ chars, as few as possible
  • \|end\| - |end| text
  • [^\w\s]* - 0 or more chars other than word and whitespace chars.
Sign up to request clarification or add additional context in comments.

Comments

0

Try this:

import re

your_string = """|start| this is first para to remove |end|.
this is another text.
|start| this is another para to remove |end|. Again some free text"""

regex = r'(\|start\|).+(\|end\|\.)'

result = re.sub(regex, '', your_string).replace('\n', '')

print(result)

Outputs:

this is another text. Again some free text

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.