0

I am trying to think of a more elegant way of replacing multiple patterns in a given string using re in relation to a little problem, which is to remove from a given string all substrings consisting of more than two spaces and also all substrings where a letter starts after a period without any space. So the sentence

'This is a strange sentence.    There are too many spaces.And.Some periods are not.  placed      properly.'

should be corrected to:

'This is a strange sentence.  There are too many spaces.  And.  Some periods are not.  placed properly.'

My solution, below, seems a bit messy. I was wondering whether there was a nicer way of doing this, as in a one-liner regex.

def correct( astring ):

    import re
    bstring = re.sub( r'  +', ' ', astring )
    letters = [frag.strip( '.' ) for frag in re.findall( r'\.\w', bstring )]
    for letter in letters:
        bstring = re.sub( r'\.{}'.format( letter ), '.  {}'.format( letter ), bstring )
    return bstring
3
  • I know, but I am coding a tutorial problem which is meant to restrict solutions to use regex. Commented Apr 2, 2015 at 10:47
  • Hopefully you are not against any solution using no regexes! :) Commented Apr 2, 2015 at 10:47
  • Of course any correct solution is useful, and a starting point. :) Commented Apr 2, 2015 at 10:49

4 Answers 4

3
s = 'This is a strange sentence.    There are too many spaces.And.Some periods are not.  placed      properly.'

print(re.sub("\s+"," ",s).replace(".",". ").rstrip())

This is a strange sentence.  There are too many spaces. And. Some periods are not.  placed properly. 
Sign up to request clarification or add additional context in comments.

4 Comments

It does work, but I was looking for pure regex solution.
@ramius, replace the replace with a sub but I don't know why you would want to
@AvinashRaj, why do you think the OP wants two spaces?
@AvinashRaj, I cannot see two spaces and I very much doubt that is required, mixing the amount of spaces would be quite strange grammar.
0

You could use re.sub function like below. This would add exactly two spaces next to the dot except the last dot and it also replaces one or more spaces except the one after dot with a single space.

>>> s = 'This is a strange sentence.    There are too many spaces.And.Some periods are not.  placed      properly.'
>>> re.sub(r'(?<!\.)\s+', ' ' ,re.sub(r'\.\s*(?!$)', r'.  ', s))
'This is a strange sentence.  There are too many spaces.  And.  Some periods are not.  placed properly.'

OR

>>> re.sub(r'\.\s*(?!$)', r'.  ', re.sub(r'\s+', ' ', s))
'This is a strange sentence.  There are too many spaces.  And.  Some periods are not.  placed properly.'

Comments

0

An approach without using any RegEX

>>> ' '.join(s.split()).replace('.','. ')[:-1]
'This is a strange sentence.  There are too many spaces. And. Some periods are not.  placed properly.'

4 Comments

This would be the obvious way to do it.
Yeah! Why spend too much brains on a simple issue :) But as you wanted a regex only answer, I waited till someone posted a regex answer! Thanks
Yes, sure, but as I said this is for a tutorial module on regex using Python, so my preferred solutions will use regex.
Fair enough! All the best
0

What pure regex? Like this?

>>> import re
>>> s = 'This is a strange sentence.    There are too many spaces.And.Some periods are not.  placed      properly.'
>>> re.sub('\s+$', '', re.sub('\s+', ' ', re.sub('\.', '. ', s)))
'This is a strange sentence. There are too many spaces. And. Some periods are not. placed properly.'

1 Comment

We have several solutions, of various types, and all seem OK.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.