Substitution of certain occurences of a string with another string in python

Question

Sorry if someone already posted the same question, but I was unable to find it.

I am trying to replace certain occurrences of a string pattern with something else. The problem I do not want to replace all occurrences, just all apart from one.

For example. Imagine I have the string: '(M:2,Seq0:2):10,Seq1:20,(Seq2:40,Seq3:40)' The pattern I want to find is: '\w+\d+:\d' (which refer to Seq[number])

Imagine I want to change all numbers after 'Seq[number]:' but not the one following for example, 'Seq1:'

Imagine that to all these numbers after Seq[number]: I wanna sum the value of 10

in The end I would like to have the string:

'(M:2,Seq0:12):10,Seq1:20,(Seq2:50,Seq3:50)'

Is there a way of doing this in a loop? I tried to use re.findall, but it returns all occurences in a text. How could I incorporate this in a loop?

Thanks!

Andrew Clark · Accepted Answer · 2014-02-04 20:06:03Z

2

You can do this using re.sub with a function as the replacement, for example:

>>> import re
>>> s = '(M:2,Seq0:2):10,Seq1:20,(Seq2:40,Seq3:40)'
>>> def repl(match):
...     return match.group(1) + str(int(match.group(2)) + 10)
...
>>> re.sub(r'(\w+(?!1:)\d+:)(\d+)', repl, s)
'(M:2,Seq0:12):10,Seq1:20,(Seq2:50,Seq3:50)'

The restriction to not match Seq1: is handled by the negative lookahead (?!1:), the capturing groups are used just to separate the portion of the string that you want to modify from the rest of it. The replacement function then returns group 1 unchanged plus the value from group 2 plus 10.

As suggested by Cilyan in comments, you could also add the restriction to not replace for Seq1: in the replacement function, which simplifies the regex. Here is how this would look:

def repl(match):
    if match.group(1) == 'Seq1:':
        return match.group(0)
    return match.group(1) + str(int(match.group(2)) + 10)

result = re.sub(r'(\w+\d+:)(\d+)', repl, s)

edit: To address the questions in your comment, here is how you could write this to modify the number that you add and which prefix (like Seq1:) should be ignored:

def make_repl(n, ignore):
    def repl(match):
        if match.group(1) == ignore:
            return match.group(0)
        return match.group(1) + str(int(match.group(2)) + n)
    return repl

result = re.sub(r'(\w+\d+:)(\d+)', make_repl(10, 'Seq1:'), s)

edited Feb 4, 2014 at 20:06

answered Feb 4, 2014 at 18:17

Andrew Clark

210k36 gold badges285 silver badges310 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Cilyan Over a year ago

Maybe it will be easier to decide which Seq to substitute inside the function rather than using a lookahead, as the regex can get quite complicated if OP wants to filter out more than 1~2 sequences.

Andrew Clark Over a year ago

@Cilyan Good suggestion, added that to my answer.

Fabs Over a year ago

I'm new to python. So why you don't pass an argument to 'repl' in 'result'? And If I want to pass arguments? For example, if I want to pass the number to sum as an argument? Both my number and Seq1 will change over time...

Andrew Clark Over a year ago

repl is kind of like a callback function, so for each match that the regex encounters the function provided (repl in this case) will be called with the match object as the only argument provided. See my edit for how to modify the number and which prefix to ignore:

Fabs Over a year ago

Thanks!! Reading in the internet... I also discovered the lambda... so if I write result = re.sub(pattern,lambda m: m.group(1)+str(int(m.group(2)) + number,s)? Where number would be a number I would pass on my loop. Would it be efficient? Or is it better to always write a function?

|

Collectives™ on Stack Overflow

Substitution of certain occurences of a string with another string in python

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related