Python replace all occurrences found using regex

Question

In python, trying to replace all occurrence of a string found using regex such as:

'10am 11pm 13am 14pm 4am'

becomes

'10 am 11 pm 13 am 14 pm 4 am'

I tried

re.sub('([0-9].*)am(.*)', r'\1 am \2', ddata)

But this only replaces the last occurrence.

and

import re
regex = re.compile('([0-9].*)am+', re.S)
myfile =  '10am 11pm 13am 14pm 4am'
myfile2 = regex.sub(lambda m: m.group().replace(r'am',r" am ",1), myfile)
print(myfile2)

only replaces the first occurence of 'am'

Expected results to me '10 am 11pm 13 am 14pm 4 am'

(\d{1,2})(?=[ap]m) replace with \1 (see here) or (\d{1,2})([ap]m) replace with \1 \2(see here) — ctwheels
– ctwheels, Commented Apr 12, 2019 at 19:47
I think I was not clear in my I was using reg ex in this case. Imagine the sentence: "the amphitheater opens at 10am-11am and 3pm-7pm" - we want to make sure NOT to replace 'am' in amphitheater. — jvence
– jvence, Commented Apr 13, 2019 at 5:50
The real question is do you really want to change that sentence/example? Given the conditions you set you CAN use this, but it's going to be ugly. >>> re.sub(r'(?<=\d)([ap]m)', r' \1', 'the amphitheater opens at 10am-11am and 3pm-7pm')... #OUTPUT: 'the amphitheater opens at 10 am-11 am and 3 pm-7 pm' — FailSafe
– FailSafe, Commented Apr 13, 2019 at 6:06
@FailSafe came to the same conclusion. positive lookbehind works but sentence looks ugly. does the OP want something like 10 am - 11 am and 3 pm - 7 pm? now that is another question altogether from the original post. :) — SanV
– SanV, Commented Apr 13, 2019 at 6:17
@FailSafe this sentence transformation is NOT meant for human consumption so yes I really do want to do this. — jvence
– jvence, Commented Apr 13, 2019 at 6:19

benvc · Accepted Answer · 2019-04-12 19:57:05Z

1

Use capture groups for both the digits and the "am" or "pm" string and then just substitute with a space between the groups.

import re

s = '10am 11pm 13am 14pm 4am'

subbed = re.sub(r'(\d+)([ap]m)', r'\1 \2', s)
print(subbed)
# 10 am 11 pm 13 am 14 pm 4 am

answered Apr 12, 2019 at 19:57

benvc

15.3k4 gold badges39 silver badges57 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

accdias · Accepted Answer · 2019-04-13 13:58:31Z

0

This will do the work:

import re
myfile =  '10am 11pm 13am 14pm 4am'
re.sub(r'(\d+)(am|pm)', r'\1 \2', myfile)

Here is the test output:

>>> import re
>>> myfile =  '10am 11pm 13am 14pm 4am'
>>> re.sub(r'(\d+)(am|pm)', r'\1 \2', myfile)
'10 am 11 pm 13 am 14 pm 4 am'
>>>

EDIT: Here is the output of the same solution dealing with the string you posted in the comments:

>>> import re
>>> myfile = 'The amphitheater opens at 10am-11am and 3pm-7pm'
>>> re.sub(r'(\d+)(am|pm)', r'\1 \2', myfile)
'The amphitheater opens at 10 am-11 am and 3 pm-7 pm'
>>>

edited Apr 13, 2019 at 13:58

answered Apr 12, 2019 at 20:06

accdias

5,3523 gold badges24 silver badges33 bronze badges

2 Comments

jvence Over a year ago

I think I was not clear in my I was using reg ex in this case. Imagine the sentence: "the amphitheater opens at 10am-11am and 3pm-7pm" - we want to make sure NOT to replace 'am' in amphitheater.

accdias Over a year ago

@jvence, did you check my answer? It address that without any problem since I'm matching numbers followed by am or pm, without spaces.

binish · Accepted Answer · 2019-04-12 19:57:38Z

0

If you really wanted a solution using regex instead of a plain string replace method as mentioned above, you could use the below snippet.

import re
myfile = '10am 11pm 13am 14pm 4am'
myfile2 = re.sub(r'(\d+)(am)', lambda m: '{} {}'.format(*m.groups()), myfile, 0)
print(myfile2)

answered Apr 12, 2019 at 19:57

binish

1091 gold badge1 silver badge6 bronze badges

3 Comments

accdias Over a year ago

Why introduce lambda and str.format when you are already using re.sub?

binish Over a year ago

@accdias that is needed since you need to know the digit and the am/pm info. This solution is flexible to handle both am and pm info. My initial snippet had the second part of regex as (am|pm) which was later edited to include only am since thats what the OP asked for. Hope that answers your question.

binish Over a year ago

I see what you are referring to here, instead of lambda, you can directly use back references like r'\1 \2'

SanV · Accepted Answer · 2019-04-12 20:00:34Z

0

You could do this without using re:

'10am 11pm 13am 14pm 4am'.replace('a',' a').replace('p',' p')  

## Output: '10 am 11 pm 13 am 14 pm 4 am'

answered Apr 12, 2019 at 20:00

SanV

9459 silver badges17 bronze badges

4 Comments

FailSafe Over a year ago

Thank you for not using a solution that's more complicated than needed. Hate to say that I wonder if this question will get -1'ed? Anyway, I'm gonna post this under yours if he wants regex because a full answer isn't needed at all here. >>> re.sub(r'(a|p)', r' \1', '10am 11pm 13am 14pm 4am') ................ #OUTPUT: '10 am 11 pm 13 am 14 pm 4 am'

SanV Over a year ago

@FailSafe Thanks and your regex pattern is the most concise and apt among the others on this page. Hopefully the OP takes notice of it.

jvence Over a year ago

@FailSafe See comment added above about the sentence: "the amphitheater opens at 10am-11am and 3pm-7pm"

accdias Over a year ago

This will have collateral effects with strings out of that pattern.

Collectives™ on Stack Overflow

Python replace all occurrences found using regex

4 Answers 4

Comments

2 Comments

3 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

2 Comments

3 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related