2

I am doing a problem from Automate the Boring Stuff, trying to imitate the strip() method using regex. I have pretty much figured it out, works with whitespace and a specific word I want removed. But when removing a specific keyword from the end of a string, it always cuts the last letter of the string off, can anyone help me figure out why?

def strip_func(string, *args):
strip_regex = re.compile(r'^(\s+)(.*?)(\s+)$')
mo = strip_regex.findall(string)
if not mo:
    rem = args[0]
    remove_regex = re.compile(rf'({rem})+(.*)[^{rem}]')
    remove_mo = remove_regex.findall(string)
    print(remove_mo[0][1])

else:
    print(mo[0][1])

So if no second argument is passed then the function deletes whitespace from either side of the string, I used this string to test that:

s = '        This is a string with whitespace on either side        '

Otherwise it deletes the keyword, kind of like the strip function. Eg:

spam = 'SpamSpamBaconSpamEggsSpamSpam'
strip_func(spam, 'Spam')

Output:

BaconSpamEgg

So missing the 's' at the end of Eggs, same thing happens with every string I try. Thanks in advance for the help.

5
  • 1
    rf'({rem})+(.*)[^{rem}]' is just wrong, you cannot negate a sequence of chars with a negated character class. Use rf'({rem})+(.*?)(?={rem}|$)' Commented May 7, 2020 at 12:21
  • 1
    I suspect all you need is def strip_func(string, *args): return re.sub(rf'^(?:{re.escape(args[0])})+(.*?)(?:{re.escape(args[0])})+$', r'\1', string, flags=re.S) . See ideone.com/6jHH68 Commented May 7, 2020 at 12:27
  • Am I right all you need is to remove consecutive multicharacter sequences at the start and end of string? Commented May 7, 2020 at 12:52
  • Thanks! I thought the negated character class sequence was weird but it was the closest I got, so just went with it. Ya I want to remove the consecutive sequences from the start and end. Your update works well, so thanks again. Commented May 7, 2020 at 13:47
  • Done, thank you for the step by step explanation as well :) Commented May 7, 2020 at 14:18

1 Answer 1

2

You may use

import re

def strip_func(string, *args):
  return re.sub(rf'^(?:{re.escape(args[0])})+(.*?)(?:{re.escape(args[0])})+$', r'\1', string, flags=re.S)

spam = 'SpamSpamBaconSpamEggsSpamSpam'
print(strip_func(spam, 'Spam'))

See the Python demo. The ^(?:{re.escape(args[0])})+(.*?)(?:{re.escape(args[0])})+$ pattern will create a pattern like ^(?:Spam)+(.*?)(?:Spam)+$ and will match

  • ^ - start of string
  • (?:Spam)+ - one or more occurrences of Spam at the start of the string
  • (.*?) - Group 1: any 0 or more chars as few as possible
  • (?:Spam)+ - one or more occurrences of Spam at the start of the string
  • $ - end of string.

The flags=re.S will make . match line break chars, too.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.