4

I have a long regex with many alternations and I want to be able to replace each match from the regex with itself followed by a new line ('\n').

What is the most efficient way to do so with re.sub()?

Here is a simple example:

s = 'I want to be able to replace many words, especially in this sentence, since it will help me solve by problem. That makes sense right?'

pattern = re.compile(r'words[,]|sentence[,]|problem[.]')

for match in matches:
    re.sub(pattern, match + '\n', match)

I know this for loop will not work, I am just hoping to clarify what I am trying to solve here. Thanks in advance for any help. I may be missing something very straightforward.

9
  • 2
    Maybe s = re.sub(pattern, "\\g<0>\n", s). Commented Sep 18, 2017 at 22:47
  • 1
    See ideone.com/FF0vL0 Commented Sep 18, 2017 at 22:51
  • Ok will do. Is there not a simple way to iterate through each match object and replace it with itself plus a newline character? I'm not sure if Python can translate between match objects and strings, which may be the problem. Commented Sep 18, 2017 at 22:54
  • Are you storing the matches and then replacing them? Commented Sep 18, 2017 at 22:55
  • That is what I would like to do I guess. I have five different matching options in one regex formula for identifying the ends of sentences (for my specific application) and I would just like to replace each match that it finds with itself plus a newline character. I only have one capturing group, which would be the full matches themselves. Commented Sep 18, 2017 at 22:57

3 Answers 3

5

To replace a whole match with itself you may use a replacement backreference \g<0>. However, you want to replace and store the matches inside a variable. You need to pass a callback method as a replacement argument to re.sub, and return the whole match value (match.group()) with a newline appended to the value:

import re
matches = []                          # Variable to hold the matches
def repl(m):                          # m is a match data object
    matches.append(m.group())         # Add a whole match value
    return "{}\n".format(m.group())   # Return the match and a newline appended to it

s = 'I want to be able to replace many words, especially in this sentence, since it will help me solve by problem. That makes sense right?'
pattern = re.compile(r'words[,]|sentence[,]|problem[.]')
s = re.sub(pattern, repl, s)

print(s)
print(matches)

See the Python demo

Sign up to request clarification or add additional context in comments.

Comments

1

the second parameter of re.sub can either be a string or a callable that takes in the match instance and returns a string. so do this

def break_line(match):
    return "\n" + match.group()

text = re.sub(pattern, break_line, text)

Comments

0

Just like this?

    text ='I want to be able to replace many words, especially in this sentence, since it will help me solve by problem. That makes sense right?'
    text_list = tex t.replace('.',',').strip(',|.|?').split(',')
    ##Remove the beginning and end symbols.And split by ','
    print (text_list)
    for i in text_list:
        ii=i.split(',')
        print(ii)

Result

    ['I want to be able to replace many words', ' especially in this sentence', ' since it will help me solve by problem', ' That makes sense right']
    ['I want to be able to replace many words']
    [' especially in this sentence']
    [' since it will help me solve by problem']
    [' That makes sense right']

1 Comment

text_list = re.sub(r'[,.?]+',',',text).strip(',').split(',')

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.