5

Trying to split the string at number 7 and I want 7 to be included in the second part of the split string.

Code:

a = 'cats can jump up to 7 times their tail length'

words = a.split("7")

print(words)

Output:

['cats can jump up to ', ' times their tail length']

String got split but second part doesn't include 7.

I want to know how I can include 7.

note: not a duplicate of Python split() without removing the delimiter because the separator has to be part of the second string.

6 Answers 6

8

A simple and naive way to do this is just to find the index of what you want to split on and slice it:

>>> a = 'cats can jump up to 7 times their tail length'
>>> ind = a.index('7')
>>> a[:ind], a[ind:]
('cats can jump up to ', '7 times their tail length')
Sign up to request clarification or add additional context in comments.

2 Comments

Nice trick, but if the delimiter repeats in the main string, only the first occurrence will be considered
@SreeramTP If the delimiter repeats then that is a requirement the OP should mention in the question.
5

in one line, using re.split with the rest of the string, and filter the last, empty string that re.split leaves:

import re
a = 'cats can jump up to 7 times their tail length'
print([x for x in re.split("(7.*)",a) if x])

result:

['cats can jump up to ', '7 times their tail length']

using () in split regex tells re.split not to discard the separator. A (7) regex would have worked but would have created a 3-item list like str.partition does, and would have required some post processing, so no one-liner.

now if the number isn't known, regex is (again) the best way to do it. Just change the code to:

[x for x in re.split("(\d.*)",a) if x]

2 Comments

What if I have a list of strings with a number in them and I have to split each string from the number. Thanks in advance :)
edited, nothing simpler with regex. Other solutions will need more rework!
5

Another way is to use str.partition:

a = 'cats can jump up to 7 times their tail length'
print(a.partition('7'))
# ('cats can jump up to ', '7', ' times their tail length')

To join the number again with the latter part you can use str.join:

x, *y = a.partition('7')
y = ''.join(y)
print((x, y))
# ('cats can jump up to ', '7 times their tail length')

Or do it manually:

sep = '7'
x = a.split(sep)
x[1] = sep + x[1]
print(tuple(x))
# ('cats can jump up to ', '7 times their tail length')

Comments

1

re can be used to capture globally as well:

>>> s = 'The 7 quick brown foxes jumped 7 times over 7 lazy dogs'
>>> sep = '7'
>>> 
>>> [i for i in re.split(f'({sep}[^{sep}]*)', s) if i]
['The ', '7 quick brown foxes jumped ', '7 times over ', '7 lazy dogs']

If the f-string is hard to read, note that it just evaluates to (7[^7]*).
(To the same end as the listcomp one can use list(filter(bool, ...)), but it's comparatively quite ugly)


In Python 3.7 and onward, re.split() allows splitting on zero-width patterns. This means a lookahead regex, namely f'(?={sep})', can be used instead of the group shown above.

What's strange about this is the timings: if using re.split() (i.e. without a compiled pattern object), the group solution consistently runs about 1.5x faster than the lookahead. However, when compiled, the lookahead beats the other hands-down:

In [4]: r_lookahead = re.compile('f(?={sep})')

In [5]: r_group = re.compile(f'({sep}[^{sep}]*)')

In [6]: %timeit [i for i in r_lookahead.split(s) if i]
2.76 µs ± 207 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [7]: %timeit [i for i in r_group.split(s) if i]
5.74 µs ± 65.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [8]: %timeit [i for i in r_lookahead.split(s * 512) if i]
137 µs ± 1.93 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [9]: %timeit [i for i in r_group.split(s * 512) if i]
1.88 ms ± 18.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

A recursive solution also works fine, although more slowly than splitting on a compiled regex (but faster than a straight re.split(...)):

def splitkeep(s, sep, prefix=''):
    start, delim, end = s.partition(sep)
    return [prefix + start, *(end and splitkeep(end, sep, delim))]
>>> s = 'The 7 quick brown foxes jumped 7 times over 7 lazy dogs'
>>> 
>>> splitkeep(s, '7')
['The ', '7 quick brown foxes jumped ', '7 times over ', '7 lazy dogs']

Comments

0

There's also the possibility to do all of it using split and list comprehension, without the need to import any library. This will, however, make your code slightly "less pretty":

a = 'cats can jump up to 7 times their tail length'
sep = '7'
splitString = a.split(sep)
splitString = list(splitString[0]) + [sep+x for x in splitString[1:]]

And with that, splitString will carry the value:

['cats can jump up to ', '7 times their tail length']

Comments

0

Using enumerate, This only works if the string doesnt start with the seperator

s = 'The quick 7 the brown foxes jumped 7 times over 7 lazy dogs'

separator = '7'
splitted = s.split(separator)

res = [((separator if i > 0 else '') + item).strip() for i, item in enumerate(splitted)]

print(res)
['The quick', '7 the brown foxes jumped', '7 times over', '7 lazy dogs']

[Program finished]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.