Python - Replace duplicate Occurrence in String
In this article, we will discuss how to replace only the duplicate occurrences of certain words in a string that is, replace a word from its second occurrence onwards, while keeping its first occurrence unchanged.
Example:
Input: Gfg is best. Gfg also has Classes now. Classes help understand better.
Output: Gfg is best. It also has Classes now. They help understand better.
Using list comprehension + set
This is the cleanest and most efficient approach. We use a set to keep track of seen words and a list comprehension to perform replacements in a single line.
Steps:
- Split the string into individual words.
- Iterate over each word.
- If it exists in the replacement dictionary and hasn’t been seen yet, keep it as-is.
- If it appears again, replace it with the mapped value.
- Join the modified words back into a single string.
s1 = 'Gfg is best . Gfg also has Classes now. Classes help understand better .'
rep = {'Gfg': 'It', 'Classes': 'They'}
seen = set()
res = [
rep[word] if word in rep and word in seen
else (seen.add(word) or word)
for word in t1.split()
]
s2 = ' '.join(res)
print( s2)
Output
Gfg is best . It also has Classes now. They help understand better .
Explanation:
- split(): breaks the string into words.
- set(): tracks words that appeared once.
- (seen.add(word) or word): ensures first appearance is stored but not replaced
- ' '.join(res): combines words back into a string.
Using Regular Expressions
This approach uses regex patterns to detect repeated occurrences of target words and replaces them using re.sub().
Steps:
- Import the re module.
- Define a regex pattern to match the target words from the replacement dictionary.
- Use a function in re.sub() to perform conditional replacements based on previous occurrences.
import re
s1 = 'Gfg is best . Gfg also has Classes now. Classes help understand better .'
d = {'Gfg': 'It', 'Classes': 'They'}
pattern = r'\b(' + '|'.join(re.escape(k) for k in d.keys()) + r')\b'
seen = set()
def fun(m):
word = m.group(1)
if word in seen:
return d[word]
seen.add(word)
return word
res = re.sub(pattern, fun, s1)
print(res)
Output
Gfg is best . It also has Classes now. They help understand better .
Explanation:
- \b(...)\b: matches complete words only.
- re.escape(): safely handles special regex characters in keys.
- re.sub(pattern, fun, s1): calls fun() for each match and replaces only later duplicates using a seen set.
Using split() + enumerate() + Loop
This is a more explicit and beginner-friendly approach using loops and sets. It manually iterates through the words, tracks first occurrences, and replaces duplicates.
s1 = 'Gfg is best . Gfg also has Classes now. Classes help understand better .'
rep = {'Gfg': 'It', 'Classes': 'They'}
words = s1.split()
seen = set()
for i, word in enumerate(words):
if word in rep:
if word in seen:
words[i] = rep[word]
else:
seen.add(word)
res = ' '.join(words)
print(res)
Output
Gfg is best . It also has Classes now. They help understand better .
Explanation:
- s1.split(): splits text into tokens.
- if word in rep and word in seen: replaces repeat occurrences using mapping.
- seen.add(word): records first encounter.
Using keys() + index() + List Comprehension
This method uses the list.index() function inside a list comprehension to find if a word has appeared before. It replaces only the repeated words and keeps the first one as it is.
s1 = 'Gfg is best . Gfg also has Classes now. Classes help understand better .'
d = {'Gfg': 'It', 'Classes': 'They'}
words = s1.split()
res = ' '.join([
d.get(word) if word in d and words.index(word) != i else word
for i, word in enumerate(words)
])
print(res)
Output
Gfg is best . It also has Classes now. They help understand better .
Explanation:
- rep.get(word): retrieves replacement from dictionary.
- words.index(word) != i: ensures only later occurrences are replaced.