3

I want to replace string like

'''1  2  3  4  5  6 abcde fghij klmno pqrst 7 8 9 10 uvwxyz abcdef 11 12 13'''

to

'''1  2  3  4  5  6
abcde fghij klmno pqrst
7 8 9 10
uvwxyz abcdef
11 12 13'''

that is my method:

s = re.sub(r'(\d) ([a-z])', r'\1\n\2', s)
s = re.sub(r'([a-z]) (\d)', r'\1\n\2', s)

how can I do this in one regular expression? and I know I can do it use re.findall and groups but I want to find a more easy way?

5 Answers 5

2

I really think the easiest way would be to match using findall instead of splitting or sub-ing:

result = re.findall(r"\d+(?:\s+\d+)*|[a-z]+(?:\s+[a-z]+)*", text)
print('\n'.join(result))

or in one line:

result = '\n'.join(re.findall(r"\d+(?:\s+\d+)*|[a-z]+(?:\s+[a-z]+)*", text))

Gives:

1  2  3  4  5  6
abcde fghij klmno pqrst
7 8 9 10
uvwxyz abcdef
11 12 13

\d+(?:\s+\d+)* matches the parts with digits and spaces.

[a-z]+(?:\s+[a-z]+)* matches the parts with letters and spaces.

Sign up to request clarification or add additional context in comments.

3 Comments

This just gives me another way to solve the problem, awesome and concise.
You might want to do [a-z]+(?:\s+[a-z]+)* to match to letters and spaces
@XWen Right, that was an omission when moving the code here, good catch :)
1

Here are two ways to do it with a single regex:

  • Use a conditional pattern. Capture \1 is straightforward. Capture \4 checks whether we grabbed \2 or \3, and then defines the rest of the pattern accordingly.

    re.sub(r'((\d)|([a-z])) ((?(2)[a-z]|\d))', r'\1\n\4', s)
    
  • Replace only the space, and surround it with look-behind and look-ahead assertions.

    re.sub(r'(?<=\d) (?=[a-z])|(?<=[a-z]) (?=\d)', '\n', s)
    

But your two simple regexes are better than all of this nonsense.

2 Comments

@GuidoBouman Maybe you should be aware that using conditionals and/or lookarounds take slightly more time and resources than not using them! It's certainly negligible on a small scale though.
@Jerry Thanks for noting. I find the lookahead & behind approach better as you only replace that part you actually need to replace. Which makes your code less error-prone.
1

You can use the regular expression or command:

s = re.sub(r'((\d) ([a-z])|([a-z]) (\d))', r'\2\4\n\3\5', s)

It'll match or group 2 & 3 or group 4 & 5. =]

4 Comments

I get this when I run your code: error: unmatched group.
Ouch, which version of Python are you using? This has been fixed recently: hg.python.org/cpython/rev/bd2f1ea04025
@GuidoBouman but it not work on Python 2.7, thank you all the same.
No problem, the other solutions are clearly better for your case. =]
1

You could use re.split

>>> s = '''1  2  3  4  5  6 abcde fghij klmno pqrst 7 8 9 10 uvwxyz abcdef 11 12 13'''
>>> for i in re.split(r'(?<=\d)\s+(?=[A-Za-z])|(?<=[A-Za-z])\s+(?=\d)', s):
        print(i)


1  2  3  4  5  6
abcde fghij klmno pqrst
7 8 9 10
uvwxyz abcdef
11 12 13
>>> print('\n'.join(re.split(r'(?<=\d)\s+(?=[A-Za-z])|(?<=[A-Za-z])\s+(?=\d)', s)))

OR

re.sub

>>> print(re.sub(r'(?<=\d)\s+(?=[A-Za-z])|(?<=[A-Za-z])\s+(?=\d)', r'\n', s))
1  2  3  4  5  6
abcde fghij klmno pqrst
7 8 9 10
uvwxyz abcdef
11 12 13

The above re.sub command will replace one or more spaces which exists between digit and a letter or between a letter and a digit with newline character.

Comments

0

You can use a replacement:

re.sub(r'(\d[\d\s]*|[a-z][a-z\s]*)', r'\1\n', s)

To be more rigorous with trailing whitespaces, you can do that:

re.sub(r'(\d(?:[\d\s]*\d)?|[a-z](?:[a-z\s]*[a-z])?)\s*', r'\1\n', s).rstrip()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.