2

If I would like to split the string from the number of the sentence: "It was amazing in 2016"

I use:

re.split('\s*((?=\d+))
out: 'It was amazing in', '2016'

Now I would like to do the opposite, so if a sentence starts with a number, then followed by a string like: '2016 was amazing'

I would like the result to be: '2016', 'was amazing'

3
  • 2
    You'll benefit from this tutorial on regular expressions. Show us what you've tried and where you're stuck. Commented Apr 10, 2017 at 18:10
  • is using regex a requirement? Commented Apr 10, 2017 at 18:11
  • Use look-behind re.split(r'(?<=\d)\s*', s) Commented Apr 10, 2017 at 18:12

3 Answers 3

5

Using lookarounds you can use a single regex for both cases:

\s+(?=\d)|(?<=\d)\s+

Code:

>>> str = "It was amazing in 2016"
>>> re.split(r'\s+(?=\d)|(?<=\d)\s+', str)
['It was amazing in', '2016']

>>> str = "2016 was amazing"
>>> re.split(r'\s+(?=\d)|(?<=\d)\s+', str)
['2016', 'was amazing']

RegEx Breakup:

  • \s+ - Match 1 or more whitespaces
  • (?=\d) - Lookbehind that asserts next character is a digit
  • | - OR
  • (?<=\d) - Lookbehind that asserts previous character is a digit
  • \s+ - Match 1 or more whitespaces
Sign up to request clarification or add additional context in comments.

2 Comments

This regex doesn't work for something like str = 'Surface Pro5' where I hoped it would split at the 5. Would be extremely grateful if you added this scenario too.
@sachinruk: You may use: filter(None, re.split(r'(\D+)(?=\d)|(?<=\d)(\D+)', str)) OR re.findall(r'\d+|\D+', str)
0

In my opinion RegEx is an overkill for that task, so unless you already are using RegEx on your program or it's required (assignment or otherwise), I recommend some string manipulation functions to get what you want.

def ends_in_digit(my_string):
    separated = my_string.rsplit(maxsplit=1)
    return separated if separated[-1].isdigit() else False

def starts_with_digit(my_string):
    separated = my_string.split(maxsplit=1)
    return separated if separated[0].isdigit() else False

Comments

0

Another way to easily split into digits and non-digits is to match with \d+|\D+ regex. It will yield chunks with leading/trailing whitespaces though, but they can easily be removed (or kept if that is not important):

import re
r = re.compile(r'\d+|\D+')
ss = [ 'It was amazing in 2016', '2016 was amazing']
for s in ss:
    print(r.findall(s)) # to get chunks with leading/trailing whitespace
    print([x.strip() for x in r.findall(s)]) # no  leading/trailing whitespace

See the Python demo.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.