adding spaces inside string using regex sub

Question

I have a string I want to split into 2-digit pieces. I tried using regex like so:

import re
s = "123456789"
t = re.sub('..', ".. ", s)
print(t)

I expected to get 12 34 56 78 9 but instead I got '.. .. .. .. 9'. The 9 does not bother me, because I know I will have an even number of digits, but how can I tell the re.sub to not replace the actual digit with a dot?

using python shell 3.5.1

EDIT

checked all 3 answers, and they all work, but the findall seems to be faster (and more elegant IMO ;p ):

import time
import re

s = "43256711233214432"

i = 10000
start = time.time()
while i:
    i -= 1
    re.sub('(..)', r"\1 ", s)    
end = time.time()

elapsed = end - start
print("using r\"\\1 \"    : ", elapsed)

i = 10000
start = time.time()
while i:
    re.sub('..', r"\g<0> ", s)
    i -= 1
end = time.time()

elapsed = end - start
print("using r\"\g<0> \" : ", elapsed)

i = 10000
start = time.time()
while i:
    ' '.join(re.findall(r'..|.', s))
    i -= 1
end = time.time()

elapsed = end - start
print("using findall   : ", elapsed)

using r"\1 " : 0.25461769104003906

using r"\g<0> " : 0.09374403953552246

using findall : 0.015610456466674805

2nd EDIT: is there a better way (or any way...) doing this without regex?

You timing analysis is weird. \1 and \g<0> do pretty much the same and should be equally fast. Using IPython's %timeit function gives me 8.5µs for both sub approaches, 2.0µs for findall and 1.5µs for range(0, len, 2) — tobias_k
– tobias_k, Commented Jul 28, 2016 at 8:47
@tobias_k don't know... I run this script about 10 times, every time "findall" was 1st, "g" 2nd and "\1" 3rd — CIsForCookies
– CIsForCookies, Commented Jul 28, 2016 at 8:52
And what happens if you test them in a different order, say first \g, then \1? findall is fastest (except range), no arguing about that. BTW, when I run your script it's more closely to %timeit, with \g and \1 coming out nearly identically. — tobias_k
– tobias_k, Commented Jul 28, 2016 at 8:56
I see minor changes, but not sure about them... I had it run 10 times in a row, and did a mean. g came faster by 0.2 both times — CIsForCookies
– CIsForCookies, Commented Jul 28, 2016 at 9:03

Avinash Raj · Accepted Answer · 2016-07-28 08:04:11Z

4

You may use re.findall also,

>>> s = "123456789"
>>> ' '.join(re.findall(r'..|.', s))
'12 34 56 78 9'
>>>

r'..|.' regex matches two chars or a single character (first preference goes to .. and then .)

answered Jul 28, 2016 at 8:04

Avinash Raj

175k32 gold badges247 silver badges289 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Wiktor Stribiżew Over a year ago

... but impractical - why achieve the result with 2 steps (re.findall + join), when you can do that using 1 step (re.sub). Moreover, a regex with an alternation performance is slower than without.

CIsForCookies Over a year ago

@WiktorStribiżew what do yo mean by 'a regex with an alternation performance is slower...'? By my tests, this method is faster by far

Wiktor Stribiżew Over a year ago

Faster than re.sub('..', r"\g<0> ", s)?

Wiktor Stribiżew · Accepted Answer · 2016-07-28 07:56:46Z

3

You can just refer to the whole match with \g<0> backreference in the replacement string pattern (where you cannot use regular expression patterns):

re.sub('..', r"\g<0> ", s)

Python demo:

import re
s = "12345678"
print(re.sub('..', r"\g<0> ", s))

See re.sub reference:

The backreference \g<0> substitutes in the entire substring matched by the RE.

answered Jul 28, 2016 at 7:56

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Comments

John1024 · Accepted Answer · 2016-07-28 08:06:37Z

2

In a regex, . means any character. In replacement text, it means a period. If you want to capture characters as a group in your regex, you need to put parens around them. You can reference the first such group in the replacement text by using \1:

>>> re.sub('(..)', r"\1 ", s)
'12 34 56 78 9'

edited Jul 28, 2016 at 8:06

answered Jul 28, 2016 at 7:54

John1024

115k15 gold badges152 silver badges183 bronze badges

Comments

user5081775 · Accepted Answer · 2016-07-28 08:20:40Z

0

You can use List Comprehensions too,

>>> s='123456789'
>>> res=[s[index:index+2] for index,x in enumerate(s) if index % 2==0]
['12', '34', '56', '78', '9']

answered Jul 28, 2016 at 8:20

user5081775

111 bronze badge

3 Comments

CIsForCookies Over a year ago

but this gives list, not a string, and there are no spaces

tobias_k Over a year ago

@noob It gives the same result as findall, you just have to ' '.join. However, compared to the other solutions it's kind of clunky.

tobias_k Over a year ago

Instead of enumerate and that modulo check, I suggest you use ' '.join(s[i:i+2] for i in range(0, len(s), 2))

Collectives™ on Stack Overflow

adding spaces inside string using regex sub

4 Answers 4

3 Comments

Comments

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related