regex to only replace certain patterns in Python

Question

In the following example, using Python 3.8, I am looking to replace only the comma in NUMBER(38,0) with a pipe.

MISC1 VARCHAR, MISC2 VARCHAR, NUMBERS NUMBER(38,0), MISC3 VARCHAR

The expected outcome would be

MISC1 VARCHAR, MISC2 VARCHAR, NUMBERS NUMBER(38|0), MISC3 VARCHAR

The NUMBER(38,0) could appear anywhere in the list so I cannot specify the 3rd comma, for example, and NUMBER(38,0) could appear several times. In addition, I need to be able to do this even if the numeric values inside the brackets changes such as NUMBER(24,2).

I have not been able to come up with a working solution that does not replace all the other commas as well so I am reaching out to the expert in the hive to see if someone else more knowledgeable then me can figure this out.

The RE string I have been using is

([A-Z])\w+\([0-9]+,[0-9]+\)

Thank you for taking a look.

If you need to change where ever it says NUMBER(xx,x)... it seems to me you need to search for NUMBER( and then find the next ) after that and replace the comma inside of those indexes. In the long run, that's probably an easier to maintain solution than a regex here. — Rashid 'Lee' Ibrahim
– Rashid 'Lee' Ibrahim, Commented Apr 23, 2021 at 15:42

chitown88 · Accepted Answer · 2021-04-23 15:42:32Z

2

import re

sampleStr = 'MISC1 VARCHAR, MISC2 VARCHAR, NUMBERS NUMBER(38,0), MISC3 VARCHAR NUMBER(24,1)'

sub_sampleStr = re.sub(r'(\(\d+)(,)(\d+\))', r'\1|\3', sampleStr)
print(sub_sampleStr)

Output:

print(sub_sampleStr)
MISC1 VARCHAR, MISC2 VARCHAR, NUMBERS NUMBER(38|0), MISC3 VARCHAR NUMBER(24|1)

answered Apr 23, 2021 at 15:42

chitown88

29.1k6 gold badges34 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user2297683 Over a year ago

This works and I love the simplicity. Can you please explain the r'\1|\3' part of the code to me so I can learn to be better?

chitown88 Over a year ago

Ya so that’s the grouping. Similar to the other solution, I have group 2 as the “,” part.

chitown88 Over a year ago

Group 1 is the (38 Group 2 is the , Group 3 is 0). So it’s basically saying keep group 1 and group 3 and put the pipe between them (and I don’t put group 2 in the substitution)

Green Cloak Guy · Accepted Answer · 2021-04-23 15:42:59Z

1

This should work, for the specific example you've shown:

import re

yourstr = 'MISC1 VARCHAR, MISC2 VARCHAR, NUMBERS NUMBER(38,0), MISC3 VARCHAR'
newstr = re.sub(r'([A-Z]+\([0-9]+),([0-9]+\))', r'\1|\2', yourstr)

It's similar to the expression you showed in your question. Essentially, we split the entire capture NUMBER(38,0) into two capture groups: NUMBER(38 and 0), separated by a comma, which we replace by a |.

If you wanted to replace an arbitrary number of commas within parentheses, then you'd probably want to use a lambda replace instead:

newstr = re.sub(r'[A-Z]+\((?:[0-9]+,)*[0-9]+\)', 
                lambda m: m.group(0).replace(',', '|'), 
                yourstr)

which just looks for an entire token like NUMBER(38,5,6,0) and replaces , with | only inside that token.

answered Apr 23, 2021 at 15:42

Green Cloak Guy

24.8k4 gold badges39 silver badges58 bronze badges

1 Comment

user2297683 Over a year ago

PERFECT! Thanks!

Collectives™ on Stack Overflow

regex to only replace certain patterns in Python

2 Answers 2

3 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related