Update string replace to ignore values in list

Question

I'm attempting to replace substrings in a string with the condition that the values being replaced are not in the ignore list. For example, as id_1 is in the the ignore_list then 'id_1' within test_str should not be replaced :

ignore_list = ['id_1']
test_str = "id_1Testid"
test_str = test_str.replace('id' , 'test2')

test_str should contain 'id_1Testtest2' instead of 'test2_1Testtest2'

How to update such that items in ignore_list which are in test_str are not replaced ?

The only way I can think of that you're going to get a one-liner solution to this is if you use the regular expression form of replace, and you can involve a list of strings in that expression. It would seem that's totally doable if you expand the list into the expression string. — CryptoFool
– CryptoFool, Commented Mar 21, 2021 at 17:54
Can you clarify what do you mean by "values being replaced are not in the ignore list". Should the items of the ignore list be inspected at the position where id is found? If not possible before and after that? What result do you expect with test_str = "abababidab_ababidab_abidababidab_idididid" and ignore_list = ["ababidab", "2id", "idt"]. Having one unique solution to this input enable us to easily discriminate correct answers from wrong ones. — Jérôme Richard
– Jérôme Richard, Commented Mar 21, 2021 at 19:13
@blue-sky The accepted answer doesn't seem to work for all scenarios, eg print(complex_replace("id_id_1Testid", ['id_1'], 'id', 'test2')). — Abhi_J
– Abhi_J, Commented Mar 22, 2021 at 18:25

Abhi_J · Accepted Answer · 2021-03-21 20:54:51Z

Not a oneliner but this works:

import re

def my_replacer(ignore_list, input_str, to_replace, replace_with):
    ignore_indices = [(m.start(), m.end()) for w in ignore_list for m in re.finditer(w, input_str)]
    temp = [(m.start(), m.end()) for m in re.finditer(to_replace, input_str)]
    replace_indices = []
    for i in temp:
        rep_i = True
        for j in ignore_indices:
            if j[0]<=i[0]<=j[1]:
                rep_i = False
                break
        if rep_i:
            replace_indices.append(i)

    if len(replace_indices):
        return my_replacer(ignore_list, ''.join([input_str[:replace_indices[0][0]], replace_with, input_str[replace_indices[0][1]:]]), to_replace, replace_with)
    else:
        return input_str


print(my_replacer(['id_1'], "id_1Testid", 'id', 'test2'))

print(my_replacer(['id_1', 'id_2'], "id_1id_2Testid_id_id", 'id', 'test2'))

print(my_replacer(['aaidaa'], "aaidaaTestid", 'id', 'test2'))

print(my_replacer(['aaidaa', 'aaidbb'], "aaidaaaaidbbTestidbyidby", 'id', 'test2'))

Output:

id_1Testtest2

id_1id_2Testtest2_test2_test2

aaidaaTesttest2

aaidaaaaidbbTesttest2bytest2by

Philippe · Accepted Answer · 2021-03-21 21:37:58Z

Not very nice, but this seems to work:

import re


def complex_replace(subject, ignore_lst, txt_to_replace, replacement_txt):
    ignore_pattern = '|'.join([re.escape(ignore_lst[i]) for i in range(len(ignore_lst))])
    str_idxs = [idx for tu in re.finditer(ignore_pattern, subject) for idx in tu.span()]
    split_str = [
        (subject[str_idxs[i]:str_idxs[i+1]], 'U' if i % 2 == 0 else 'M') 
        for i in range(len(str_idxs) - 1)
    ]
    split_str.append((subject[str_idxs[-1]:len(subject)], 'M'))
    res = ''.join(
        [
            substr[0].replace(txt_to_replace, replacement_txt) 
            if substr[1] == 'M' else substr[0] for substr in split_str
        ]
    )
    return res

What this does is the following:

Build a regex pattern with the ignore list (items in the ignore list separated by |
Build a list of indexes marking the start and the end of the substrings of the subject string matching the items in the ignore list
Build a substrings list, where each of this list items consist of a tuple with the substring and a flag to mark the substring as not mutable ('U') or mutable ('M'). Add the end of the subject string (from the last index found in step 2 to the end of the subject string) to that substrings list
Do the replacement using join and list comprehension, based on the tuples in the substrings list built in 3: only do the replacement if the substring is flagged as mutable ('M'), otherwise ('U'), take the substring unchanged

The following tests:

ignore_list = ['id_1']
test_str = "id_1Testid"
to_replace = 'id'
replacement = 'test2'
print(complex_replace(test_str, ignore_list, to_replace, replacement))

ignore_list = ['test', 'blah']
test_str = 'test  blah testbidtest bitest   testblue'
to_replace = 'bi'
replacement = 'tooTooT'
print(complex_replace(test_str, ignore_list, to_replace, replacement))

ignore_list = ['id_1', 'id_2']
test_str = "id_1id_2Testid_id_id"
to_replace = 'id'
replacement = 'test2'
print(complex_replace(test_str, ignore_list, to_replace, replacement))

ignore_list = ['aaidaa']
test_str = "aaidaaTestid"
to_replace = 'id'
replacement = 'test2'
print(complex_replace(test_str, ignore_list, to_replace, replacement))

ignore_list = ['aaidaa', 'aaidbb']
test_str = "aaidaaaaidbbTestidbyidby"
to_replace = 'id'
replacement = 'test2'
print(complex_replace(test_str, ignore_list, to_replace, replacement))

give the output below:

id_1Testtest2
test  blah testtooTooTdtest tooTooTtest   testblue
id_1id_2Testtest2_test2_test2
aaidaaTesttest2
aaidaaaaidbbTesttest2bytest2by

bad_coder · Accepted Answer · 2021-03-21 17:59:56Z

0

You could first replace "id_1" with "#" (e.g.) and then add it afterwards:

ignore_list = ['id_1']
test_str = "id_1Testid"
replace_str = "id"
test_str = test_str.replace(ignore_list[0], "#")
test_str = test_str.replace(replace_str, "test2")
test_str = test_str.replace("#", ignore_list[0])
print(test_str)

answered Mar 21, 2021 at 17:59

bad_coder

2011 silver badge11 bronze badges

3 Comments

CryptoFool Over a year ago

But what if # appears somewhere in test_str?

bad_coder Over a year ago

yeah, fair point, maybe you have to use something more special, like ~ or so... but that should be the logic how it works.

CryptoFool Over a year ago

What you need to do is first escape whatever you use as a replacement, and then have your logic be smart enough to not restore the escaped versions of the replacement, but rather just unescape them. - I guess your solution could work as is once one knew the domain of the incoming string data. If one could know that the replacement wouldn't ever appear in the incoming data, then this would be fine. One thing you could do is make the replacement more obscure, like ~~##$#@! :)

Jérôme Richard · Accepted Answer · 2021-03-21 18:14:03Z

0

You can build a regexp to search that using negative look-ahead:

def specialReplace(string, searchStr, replaceStr, ignoreList):
    # Create a regexp searching for all id that is not in the ignoreList thanks to negative lookahead.
    searchedRegexp = re.escape(searchStr)
    ignoredPatternRegexp = '|'.join([re.escape(k) for k in sorted(ignoreList,key=len,reverse=True)])
    pattern = re.compile(f'(?!({ignoredPatternRegexp})){searchedRegexp}', flags=re.DOTALL)

    # Do the actual replacement job
    return pattern.sub(replaceStr, string)

print(specialReplace("id_1Testid", "id", "test2", ['id_1']))

Output:

id_1Testtest2

edited Mar 21, 2021 at 18:14

answered Mar 21, 2021 at 18:08

Jérôme Richard

53.4k6 gold badges48 silver badges77 bronze badges

6 Comments

CryptoFool Over a year ago

This doesn't seem to work generally. Try changing id_1 to aaidaa in both places in the test call. I'm curious as to why this is...why it works for one ignore word and not another.

CryptoFool Over a year ago

Been playing with this. Your solution only works if the string to be replaced appears at the beginning of each word in ignore_list.

Jérôme Richard Over a year ago

@CryptoFool Do you mean specialReplace("aaidaaTestid","id","test2",['aaidaa'])? It gives 'aatest2aaTesttest2'. Which seems correct to me: the first one is replaced because aaidaa starts before. The second is not in the ignore list. This seems compliant with what the OP ask although this point is not very clear in the question...

CryptoFool Over a year ago

Yes, that's what I mean. What do you mean "starts before"? The first one is in the ignore list, so why is it ok that it is replaced? Shouldn't the answer be aaidaaTesttest2? I don't see how the OPs question in any way excludes this case.

Jérôme Richard Over a year ago

I understand the question as: search id in the string and do not replace it if any of its item is found at this position (thus not before and not after).

|

Manuel · Accepted Answer · 2021-04-04 10:57:21Z

0

Using The Greatest Regex Trick Ever, finding both desired and undesired matches and replacing only the desired ones:

import re

def complex_replace(subject, ignore_lst, txt_to_replace, replacement_txt):
    return re.sub('|'.join(map(re.escape, ignore_lst + [txt_to_replace])),
                  lambda match: match.group()
                                if match.group() in ignore_lst else
                                replacement_txt,
                  subject)

Same results as the accepted answer's solution on all its test cases.

answered Apr 4, 2021 at 10:57

Manuel

9106 silver badges11 bronze badges

Collectives™ on Stack Overflow

Update string replace to ignore values in list

5 Answers 5

Comments

Comments

3 Comments

6 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

3 Comments

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related