1

I want to read in a string and delete the captured group (in this case "[^ ]+(&)[^ ]).

x = "apple&bob & john & smith" # original string
x = "applebob & john & smith" #after replacing string

This is the code I am using now.

import re

and_regex = re.compile(r'([^ ]+(&)[^ ])')
x = "apple&bob & john & smith"
x = re.sub(and_regex, " ",x)
print(x)

I cannot use the string replace (string.replace) because it will replace the "&"s in the entire string.

Thanks for the help!

2
  • 3
    I wonder if lookarounds would be helpful here: re.compile(r'(?<=\S)&(?=\S)'). Commented Apr 13, 2021 at 7:19
  • 1
    What is the expected for &apple&bob & john & smith& ? Commented Apr 13, 2021 at 7:26

3 Answers 3

3

you can do this:

import re
x = "apple&bob & john & smith"
x = re.sub("(?<=\S)&(?=\S)", "",x)
print(x)

output:

applebob & john & smith

Sign up to request clarification or add additional context in comments.

2 Comments

Exactly the complement to my method ;) . Lookarounds I have always to lookup anew. So I prefer a method without lookarounds. (lazier ;) ).
As one can see - without lookarounds, one has to repeatedly apply re.sub() to replace all occurrences of matches, with lookarounds only once - so this solution is more elegant in the hindsight!
2

As al alternative, if you also want to remove the & char at the start and end in for example &apple&bob & john & smith& you can either assert a non whitespace char to the left OR assert a non whitespace char to the right.

(?<=\S)&|&(?=\S)

Regex demo

import re

strings = [
    "apple&bob & john & smith",
    "&apple&bob & john & smith&",
    "&apple&bob & john & smith&&"
]

for s in strings:
    print(re.sub(r"(?<=\S)&|&(?=\S)", "", s))

Output

applebob & john & smith
applebob & john & smith
applebob & john & smith

2 Comments

That is the most sophisticate pattern!
@Gwang-JinKim Thanks, it is an alternative in case having 2 non whitespace chars at the left and right is not required.
1

You can capture those parts you want to keep. And when replacing with .sub() method, enter the captures parts using \\1 and \\2 in the replacer string.

import re
pattern = re.compile(r'(\S+)&(\S+)')
# `\S` means: any non-white character.
# see: https://docs.python.org/3/library/re.html

x = "apple&bob & john & smith"
x = pattern.sub("\\1\\2", x) # or also: re.sub(pattern, "\\1\\2", x)

x
## 'applebob & john & smith'

However, this replaces only 1 occurrence, the leftmost non-overlapping one, we need a function to replace all occurrences in the string. One can solve it using recursion:

def replace_all_pattern(pattern, s):
    if bool(re.match(pattern, s)):
        res = re.sub(pattern, "\\1\\2", s)
        return replace_all_pattern(pattern, res)
    else:
        return s


replace_all_pattern(r"(\S+)&(\S+)", "abble&bob&john&smith")
## 'abblebobjohnsmith'

But this will be performance-wise less efficient than using look-arounds. So use this only if exactly one occurrence is to be replaced. In that case, preformance-wise, it is better than the look-arounds, but as soon as more than one occurrences are possible and have to be checked: use the look-arounds as pattern, because they will be more efficient.

8 Comments

This solution does not works with "apple&bob&john&smith": it gives "apple&bob&johnsmith".
ah I see ... it does it only once and not repeatedly.
but it is a problem of re.sub(): "Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement repl."
Yeah, the problem is in "non-overlapping". I think you need at least one look-around. If you put a look-around on the right it should be better than 2 look-around because they are costly (possibly done with non-linear parsing), especially look-ahead.
Yes, with them, it gives "applebobjohnsmith" because look-around are not greedy (this is what makes them inefficient but also avoid the overlapping issue)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.