3

Given a string of text, in Python:

s = "(((((hi abc )))))))"
s = "***(((((hi abc ***&&&&"

How do I replace all non-alphabetic symbols that occur more than 3 times...as blank string

For all the above, the result should be:

hi abc
1
  • What should the output be if the input is "(&*hello!@#"? Commented Jul 23, 2010 at 0:38

3 Answers 3

8

This should work: \W{3,}: matching non-alphanumerics that occur 3 or more times:

>>> s = "***(((((hi abc ***&&&&"
>>> re.sub("\W{3,}", "", s) 
'hi abc'
>>> s = "(((((hi abc )))))))"
>>> re.sub("\W{3,}", "", s) 
'hi abc'
Sign up to request clarification or add additional context in comments.

1 Comment

@John : Correct. The examples included '***', so I shot a guess that he wanted 3+... I was confident that, given this solution, he could figure out how to add one. (That's why I italicized 3 or more)
4

If you want to replace any sequence of non-space non-alphamerics (e.g. '!?&' as well as your examples), @Stephen's answer is fine. But if you only want to replace sequences of three or more identical non-alphamerics, a backreference will help:

>>> r3 = re.compile(r'(([^\s\w])\2{2,})')
>>> r3.findall('&&&xxx!&?yyy*****')
[('&&&', '&'), ('*****', '*')]

So, for example:

>>> r3.sub('', '&&&xxx!&?yyy*****')
'xxx!&?yyy'

2 Comments

+1, I came back to add backreferences to my answer, but I'll let you have it... :)
@John, yep, but as @Stephen already explained, it's more believable that the OP did a slight mistake in English, than a total blooper in his example of desired behavior;-).
0

You can't (easily, using regexes) replace that by a "blank string" that's the same length as the replaced text. You can replace it with an empty string "" or a single space " " or any other constant string of your choice; I've used "*" in the example so that it is easier to see what is happening.

>>> re.sub(r"(\W)\1{3,}", "*", "12345<><>aaaaa%%%11111<<<<..>>>>")
'12345<><>aaaaa%%%11111*..*'
>>>

Note carefully: it doesn't change "<><>" ... I'm assuming that "non-alphabetic symbols that occur more than 3 times" means the same symbol has to occur more than 3 times". I'm also assuming that you did mean "more than 3" and not "3 or more".

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.