5

I'm attempting to replace substrings in a string with the condition that the values being replaced are not in the ignore list. For example, as id_1 is in the the ignore_list then 'id_1' within test_str should not be replaced :

ignore_list = ['id_1']
test_str = "id_1Testid"
test_str = test_str.replace('id' , 'test2')

test_str should contain 'id_1Testtest2' instead of 'test2_1Testtest2'

How to update such that items in ignore_list which are in test_str are not replaced ?

4
  • 1
    That's a very nice challenge :D Commented Mar 21, 2021 at 17:52
  • The only way I can think of that you're going to get a one-liner solution to this is if you use the regular expression form of replace, and you can involve a list of strings in that expression. It would seem that's totally doable if you expand the list into the expression string. Commented Mar 21, 2021 at 17:54
  • Can you clarify what do you mean by "values being replaced are not in the ignore list". Should the items of the ignore list be inspected at the position where id is found? If not possible before and after that? What result do you expect with test_str = "abababidab_ababidab_abidababidab_idididid" and ignore_list = ["ababidab", "2id", "idt"]. Having one unique solution to this input enable us to easily discriminate correct answers from wrong ones. Commented Mar 21, 2021 at 19:13
  • @blue-sky The accepted answer doesn't seem to work for all scenarios, eg print(complex_replace("id_id_1Testid", ['id_1'], 'id', 'test2')). Commented Mar 22, 2021 at 18:25

5 Answers 5

2

Not a oneliner but this works:

import re

def my_replacer(ignore_list, input_str, to_replace, replace_with):
    ignore_indices = [(m.start(), m.end()) for w in ignore_list for m in re.finditer(w, input_str)]
    temp = [(m.start(), m.end()) for m in re.finditer(to_replace, input_str)]
    replace_indices = []
    for i in temp:
        rep_i = True
        for j in ignore_indices:
            if j[0]<=i[0]<=j[1]:
                rep_i = False
                break
        if rep_i:
            replace_indices.append(i)

    if len(replace_indices):
        return my_replacer(ignore_list, ''.join([input_str[:replace_indices[0][0]], replace_with, input_str[replace_indices[0][1]:]]), to_replace, replace_with)
    else:
        return input_str


print(my_replacer(['id_1'], "id_1Testid", 'id', 'test2'))

print(my_replacer(['id_1', 'id_2'], "id_1id_2Testid_id_id", 'id', 'test2'))

print(my_replacer(['aaidaa'], "aaidaaTestid", 'id', 'test2'))

print(my_replacer(['aaidaa', 'aaidbb'], "aaidaaaaidbbTestidbyidby", 'id', 'test2'))

Output:

id_1Testtest2

id_1id_2Testtest2_test2_test2

aaidaaTesttest2

aaidaaaaidbbTesttest2bytest2by
Sign up to request clarification or add additional context in comments.

Comments

1

Not very nice, but this seems to work:

import re


def complex_replace(subject, ignore_lst, txt_to_replace, replacement_txt):
    ignore_pattern = '|'.join([re.escape(ignore_lst[i]) for i in range(len(ignore_lst))])
    str_idxs = [idx for tu in re.finditer(ignore_pattern, subject) for idx in tu.span()]
    split_str = [
        (subject[str_idxs[i]:str_idxs[i+1]], 'U' if i % 2 == 0 else 'M') 
        for i in range(len(str_idxs) - 1)
    ]
    split_str.append((subject[str_idxs[-1]:len(subject)], 'M'))
    res = ''.join(
        [
            substr[0].replace(txt_to_replace, replacement_txt) 
            if substr[1] == 'M' else substr[0] for substr in split_str
        ]
    )
    return res

What this does is the following:

  1. Build a regex pattern with the ignore list (items in the ignore list separated by |
  2. Build a list of indexes marking the start and the end of the substrings of the subject string matching the items in the ignore list
  3. Build a substrings list, where each of this list items consist of a tuple with the substring and a flag to mark the substring as not mutable ('U') or mutable ('M'). Add the end of the subject string (from the last index found in step 2 to the end of the subject string) to that substrings list
  4. Do the replacement using join and list comprehension, based on the tuples in the substrings list built in 3: only do the replacement if the substring is flagged as mutable ('M'), otherwise ('U'), take the substring unchanged

The following tests:

ignore_list = ['id_1']
test_str = "id_1Testid"
to_replace = 'id'
replacement = 'test2'
print(complex_replace(test_str, ignore_list, to_replace, replacement))

ignore_list = ['test', 'blah']
test_str = 'test  blah testbidtest bitest   testblue'
to_replace = 'bi'
replacement = 'tooTooT'
print(complex_replace(test_str, ignore_list, to_replace, replacement))

ignore_list = ['id_1', 'id_2']
test_str = "id_1id_2Testid_id_id"
to_replace = 'id'
replacement = 'test2'
print(complex_replace(test_str, ignore_list, to_replace, replacement))

ignore_list = ['aaidaa']
test_str = "aaidaaTestid"
to_replace = 'id'
replacement = 'test2'
print(complex_replace(test_str, ignore_list, to_replace, replacement))

ignore_list = ['aaidaa', 'aaidbb']
test_str = "aaidaaaaidbbTestidbyidby"
to_replace = 'id'
replacement = 'test2'
print(complex_replace(test_str, ignore_list, to_replace, replacement))

give the output below:

id_1Testtest2
test  blah testtooTooTdtest tooTooTtest   testblue
id_1id_2Testtest2_test2_test2
aaidaaTesttest2
aaidaaaaidbbTesttest2bytest2by

Comments

0

You could first replace "id_1" with "#" (e.g.) and then add it afterwards:

ignore_list = ['id_1']
test_str = "id_1Testid"
replace_str = "id"
test_str = test_str.replace(ignore_list[0], "#")
test_str = test_str.replace(replace_str, "test2")
test_str = test_str.replace("#", ignore_list[0])
print(test_str)

3 Comments

But what if # appears somewhere in test_str?
yeah, fair point, maybe you have to use something more special, like ~ or so... but that should be the logic how it works.
What you need to do is first escape whatever you use as a replacement, and then have your logic be smart enough to not restore the escaped versions of the replacement, but rather just unescape them. - I guess your solution could work as is once one knew the domain of the incoming string data. If one could know that the replacement wouldn't ever appear in the incoming data, then this would be fine. One thing you could do is make the replacement more obscure, like ~~##$#@! :)
0

You can build a regexp to search that using negative look-ahead:

def specialReplace(string, searchStr, replaceStr, ignoreList):
    # Create a regexp searching for all id that is not in the ignoreList thanks to negative lookahead.
    searchedRegexp = re.escape(searchStr)
    ignoredPatternRegexp = '|'.join([re.escape(k) for k in sorted(ignoreList,key=len,reverse=True)])
    pattern = re.compile(f'(?!({ignoredPatternRegexp})){searchedRegexp}', flags=re.DOTALL)

    # Do the actual replacement job
    return pattern.sub(replaceStr, string)

print(specialReplace("id_1Testid", "id", "test2", ['id_1']))

Output:

id_1Testtest2

6 Comments

This doesn't seem to work generally. Try changing id_1 to aaidaa in both places in the test call. I'm curious as to why this is...why it works for one ignore word and not another.
Been playing with this. Your solution only works if the string to be replaced appears at the beginning of each word in ignore_list.
@CryptoFool Do you mean specialReplace("aaidaaTestid","id","test2",['aaidaa'])? It gives 'aatest2aaTesttest2'. Which seems correct to me: the first one is replaced because aaidaa starts before. The second is not in the ignore list. This seems compliant with what the OP ask although this point is not very clear in the question...
Yes, that's what I mean. What do you mean "starts before"? The first one is in the ignore list, so why is it ok that it is replaced? Shouldn't the answer be aaidaaTesttest2? I don't see how the OPs question in any way excludes this case.
I understand the question as: search id in the string and do not replace it if any of its item is found at this position (thus not before and not after).
|
0

Using The Greatest Regex Trick Ever, finding both desired and undesired matches and replacing only the desired ones:

import re

def complex_replace(subject, ignore_lst, txt_to_replace, replacement_txt):
    return re.sub('|'.join(map(re.escape, ignore_lst + [txt_to_replace])),
                  lambda match: match.group()
                                if match.group() in ignore_lst else
                                replacement_txt,
                  subject)

Same results as the accepted answer's solution on all its test cases.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.