0

Sorry if someone already posted the same question, but I was unable to find it.

I am trying to replace certain occurrences of a string pattern with something else. The problem I do not want to replace all occurrences, just all apart from one.

For example. Imagine I have the string: '(M:2,Seq0:2):10,Seq1:20,(Seq2:40,Seq3:40)' The pattern I want to find is: '\w+\d+:\d' (which refer to Seq[number])

Imagine I want to change all numbers after 'Seq[number]:' but not the one following for example, 'Seq1:'

Imagine that to all these numbers after Seq[number]: I wanna sum the value of 10

in The end I would like to have the string:

'(M:2,Seq0:12):10,Seq1:20,(Seq2:50,Seq3:50)'

Is there a way of doing this in a loop? I tried to use re.findall, but it returns all occurences in a text. How could I incorporate this in a loop?

Thanks!

1 Answer 1

2

You can do this using re.sub with a function as the replacement, for example:

>>> import re
>>> s = '(M:2,Seq0:2):10,Seq1:20,(Seq2:40,Seq3:40)'
>>> def repl(match):
...     return match.group(1) + str(int(match.group(2)) + 10)
...
>>> re.sub(r'(\w+(?!1:)\d+:)(\d+)', repl, s)
'(M:2,Seq0:12):10,Seq1:20,(Seq2:50,Seq3:50)'

The restriction to not match Seq1: is handled by the negative lookahead (?!1:), the capturing groups are used just to separate the portion of the string that you want to modify from the rest of it. The replacement function then returns group 1 unchanged plus the value from group 2 plus 10.

As suggested by Cilyan in comments, you could also add the restriction to not replace for Seq1: in the replacement function, which simplifies the regex. Here is how this would look:

def repl(match):
    if match.group(1) == 'Seq1:':
        return match.group(0)
    return match.group(1) + str(int(match.group(2)) + 10)

result = re.sub(r'(\w+\d+:)(\d+)', repl, s)

edit: To address the questions in your comment, here is how you could write this to modify the number that you add and which prefix (like Seq1:) should be ignored:

def make_repl(n, ignore):
    def repl(match):
        if match.group(1) == ignore:
            return match.group(0)
        return match.group(1) + str(int(match.group(2)) + n)
    return repl

result = re.sub(r'(\w+\d+:)(\d+)', make_repl(10, 'Seq1:'), s)
Sign up to request clarification or add additional context in comments.

6 Comments

Maybe it will be easier to decide which Seq to substitute inside the function rather than using a lookahead, as the regex can get quite complicated if OP wants to filter out more than 1~2 sequences.
@Cilyan Good suggestion, added that to my answer.
I'm new to python. So why you don't pass an argument to 'repl' in 'result'? And If I want to pass arguments? For example, if I want to pass the number to sum as an argument? Both my number and Seq1 will change over time...
repl is kind of like a callback function, so for each match that the regex encounters the function provided (repl in this case) will be called with the match object as the only argument provided. See my edit for how to modify the number and which prefix to ignore:
Thanks!! Reading in the internet... I also discovered the lambda... so if I write result = re.sub(pattern,lambda m: m.group(1)+str(int(m.group(2)) + number,s)? Where number would be a number I would pass on my loop. Would it be efficient? Or is it better to always write a function?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.