How to use regex to replace a specific group in a string using Python?

Question

I want to read in a string and delete the captured group (in this case "[^ ]+(&)[^ ]).

x = "apple&bob & john & smith" # original string
x = "applebob & john & smith" #after replacing string

This is the code I am using now.

import re

and_regex = re.compile(r'([^ ]+(&)[^ ])')
x = "apple&bob & john & smith"
x = re.sub(and_regex, " ",x)
print(x)

I cannot use the string replace (string.replace) because it will replace the "&"s in the entire string.

Thanks for the help!

I wonder if lookarounds would be helpful here: re.compile(r'(?<=\S)&(?=\S)'). — Mark
– Mark, Commented Apr 13, 2021 at 7:19

antdul · Accepted Answer · 2021-04-13 07:29:18Z

3

you can do this:

import re
x = "apple&bob & john & smith"
x = re.sub("(?<=\S)&(?=\S)", "",x)
print(x)

output:

applebob & john & smith

answered Apr 13, 2021 at 7:29

antdul

3961 silver badge10 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Gwang-Jin Kim Over a year ago

Exactly the complement to my method ;) . Lookarounds I have always to lookup anew. So I prefer a method without lookarounds. (lazier ;) ).

Gwang-Jin Kim Over a year ago

As one can see - without lookarounds, one has to repeatedly apply re.sub() to replace all occurrences of matches, with lookarounds only once - so this solution is more elegant in the hindsight!

The fourth bird · Accepted Answer · 2021-04-13 07:53:14Z

2

As al alternative, if you also want to remove the & char at the start and end in for example &apple&bob & john & smith& you can either assert a non whitespace char to the left OR assert a non whitespace char to the right.

(?<=\S)&|&(?=\S)

Regex demo

import re

strings = [
    "apple&bob & john & smith",
    "&apple&bob & john & smith&",
    "&apple&bob & john & smith&&"
]

for s in strings:
    print(re.sub(r"(?<=\S)&|&(?=\S)", "", s))

Output

applebob & john & smith
applebob & john & smith
applebob & john & smith

answered Apr 13, 2021 at 7:53

The fourth bird

165k16 gold badges61 silver badges75 bronze badges

2 Comments

Gwang-Jin Kim Over a year ago

That is the most sophisticate pattern!

The fourth bird Over a year ago

@Gwang-JinKim Thanks, it is an alternative in case having 2 non whitespace chars at the left and right is not required.

Gwang-Jin Kim · Accepted Answer · 2021-04-13 07:58:53Z

1

You can capture those parts you want to keep. And when replacing with .sub() method, enter the captures parts using \\1 and \\2 in the replacer string.

import re
pattern = re.compile(r'(\S+)&(\S+)')
# `\S` means: any non-white character.
# see: https://docs.python.org/3/library/re.html

x = "apple&bob & john & smith"
x = pattern.sub("\\1\\2", x) # or also: re.sub(pattern, "\\1\\2", x)

x
## 'applebob & john & smith'

However, this replaces only 1 occurrence, the leftmost non-overlapping one, we need a function to replace all occurrences in the string. One can solve it using recursion:

def replace_all_pattern(pattern, s):
    if bool(re.match(pattern, s)):
        res = re.sub(pattern, "\\1\\2", s)
        return replace_all_pattern(pattern, res)
    else:
        return s


replace_all_pattern(r"(\S+)&(\S+)", "abble&bob&john&smith")
## 'abblebobjohnsmith'

But this will be performance-wise less efficient than using look-arounds. So use this only if exactly one occurrence is to be replaced. In that case, preformance-wise, it is better than the look-arounds, but as soon as more than one occurrences are possible and have to be checked: use the look-arounds as pattern, because they will be more efficient.

edited Apr 13, 2021 at 7:58

answered Apr 13, 2021 at 7:28

Gwang-Jin Kim

11.1k20 silver badges39 bronze badges

8 Comments

Jérôme Richard Over a year ago

This solution does not works with "apple&bob&john&smith": it gives "apple&bob&johnsmith".

Gwang-Jin Kim Over a year ago

ah I see ... it does it only once and not repeatedly.

Gwang-Jin Kim Over a year ago

but it is a problem of re.sub(): "Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement repl."

Jérôme Richard Over a year ago

Yeah, the problem is in "non-overlapping". I think you need at least one look-around. If you put a look-around on the right it should be better than 2 look-around because they are costly (possibly done with non-linear parsing), especially look-ahead.

Jérôme Richard Over a year ago

Yes, with them, it gives "applebobjohnsmith" because look-around are not greedy (this is what makes them inefficient but also avoid the overlapping issue)

|

Collectives™ on Stack Overflow

How to use regex to replace a specific group in a string using Python?

3 Answers 3

2 Comments

2 Comments

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

2 Comments

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related