1

I have a string I want to split into 2-digit pieces. I tried using regex like so:

import re
s = "123456789"
t = re.sub('..', ".. ", s)
print(t)

I expected to get 12 34 56 78 9 but instead I got '.. .. .. .. 9'. The 9 does not bother me, because I know I will have an even number of digits, but how can I tell the re.sub to not replace the actual digit with a dot?

using python shell 3.5.1

EDIT

checked all 3 answers, and they all work, but the findall seems to be faster (and more elegant IMO ;p ):

import time
import re

s = "43256711233214432"

i = 10000
start = time.time()
while i:
    i -= 1
    re.sub('(..)', r"\1 ", s)    
end = time.time()

elapsed = end - start
print("using r\"\\1 \"    : ", elapsed)

i = 10000
start = time.time()
while i:
    re.sub('..', r"\g<0> ", s)
    i -= 1
end = time.time()

elapsed = end - start
print("using r\"\g<0> \" : ", elapsed)

i = 10000
start = time.time()
while i:
    ' '.join(re.findall(r'..|.', s))
    i -= 1
end = time.time()

elapsed = end - start
print("using findall   : ", elapsed)

using r"\1 " : 0.25461769104003906

using r"\g<0> " : 0.09374403953552246

using findall : 0.015610456466674805

2nd EDIT: is there a better way (or any way...) doing this without regex?

4
  • You timing analysis is weird. \1 and \g<0> do pretty much the same and should be equally fast. Using IPython's %timeit function gives me 8.5µs for both sub approaches, 2.0µs for findall and 1.5µs for range(0, len, 2) Commented Jul 28, 2016 at 8:47
  • @tobias_k don't know... I run this script about 10 times, every time "findall" was 1st, "g" 2nd and "\1" 3rd Commented Jul 28, 2016 at 8:52
  • And what happens if you test them in a different order, say first \g, then \1? findall is fastest (except range), no arguing about that. BTW, when I run your script it's more closely to %timeit, with \g and \1 coming out nearly identically. Commented Jul 28, 2016 at 8:56
  • I see minor changes, but not sure about them... I had it run 10 times in a row, and did a mean. g came faster by 0.2 both times Commented Jul 28, 2016 at 9:03

4 Answers 4

4

You may use re.findall also,

>>> s = "123456789"
>>> ' '.join(re.findall(r'..|.', s))
'12 34 56 78 9'
>>> 

r'..|.' regex matches two chars or a single character (first preference goes to .. and then .)

Sign up to request clarification or add additional context in comments.

3 Comments

... but impractical - why achieve the result with 2 steps (re.findall + join), when you can do that using 1 step (re.sub). Moreover, a regex with an alternation performance is slower than without.
@WiktorStribiżew what do yo mean by 'a regex with an alternation performance is slower...'? By my tests, this method is faster by far
Faster than re.sub('..', r"\g<0> ", s)?
3

You can just refer to the whole match with \g<0> backreference in the replacement string pattern (where you cannot use regular expression patterns):

re.sub('..', r"\g<0> ", s)

Python demo:

import re
s = "12345678"
print(re.sub('..', r"\g<0> ", s))

See re.sub reference:

The backreference \g<0> substitutes in the entire substring matched by the RE.

Comments

2

In a regex, . means any character. In replacement text, it means a period. If you want to capture characters as a group in your regex, you need to put parens around them. You can reference the first such group in the replacement text by using \1:

>>> re.sub('(..)', r"\1 ", s)
'12 34 56 78 9'

Comments

0

You can use List Comprehensions too,

>>> s='123456789'
>>> res=[s[index:index+2] for index,x in enumerate(s) if index % 2==0]
['12', '34', '56', '78', '9']

3 Comments

but this gives list, not a string, and there are no spaces
@noob It gives the same result as findall, you just have to ' '.join. However, compared to the other solutions it's kind of clunky.
Instead of enumerate and that modulo check, I suggest you use ' '.join(s[i:i+2] for i in range(0, len(s), 2))

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.