0

In the following example, using Python 3.8, I am looking to replace only the comma in NUMBER(38,0) with a pipe.

MISC1 VARCHAR, MISC2 VARCHAR, NUMBERS NUMBER(38,0), MISC3 VARCHAR

The expected outcome would be

MISC1 VARCHAR, MISC2 VARCHAR, NUMBERS NUMBER(38|0), MISC3 VARCHAR

The NUMBER(38,0) could appear anywhere in the list so I cannot specify the 3rd comma, for example, and NUMBER(38,0) could appear several times. In addition, I need to be able to do this even if the numeric values inside the brackets changes such as NUMBER(24,2).

I have not been able to come up with a working solution that does not replace all the other commas as well so I am reaching out to the expert in the hive to see if someone else more knowledgeable then me can figure this out.

The RE string I have been using is

([A-Z])\w+\([0-9]+,[0-9]+\)

Thank you for taking a look.

1
  • If you need to change where ever it says NUMBER(xx,x)... it seems to me you need to search for NUMBER( and then find the next ) after that and replace the comma inside of those indexes. In the long run, that's probably an easier to maintain solution than a regex here. Commented Apr 23, 2021 at 15:42

2 Answers 2

2
import re

sampleStr = 'MISC1 VARCHAR, MISC2 VARCHAR, NUMBERS NUMBER(38,0), MISC3 VARCHAR NUMBER(24,1)'

sub_sampleStr = re.sub(r'(\(\d+)(,)(\d+\))', r'\1|\3', sampleStr)
print(sub_sampleStr)

Output:

print(sub_sampleStr)
MISC1 VARCHAR, MISC2 VARCHAR, NUMBERS NUMBER(38|0), MISC3 VARCHAR NUMBER(24|1)
Sign up to request clarification or add additional context in comments.

3 Comments

This works and I love the simplicity. Can you please explain the r'\1|\3' part of the code to me so I can learn to be better?
Ya so that’s the grouping. Similar to the other solution, I have group 2 as the “,” part.
Group 1 is the (38 Group 2 is the , Group 3 is 0). So it’s basically saying keep group 1 and group 3 and put the pipe between them (and I don’t put group 2 in the substitution)
1

This should work, for the specific example you've shown:

import re

yourstr = 'MISC1 VARCHAR, MISC2 VARCHAR, NUMBERS NUMBER(38,0), MISC3 VARCHAR'
newstr = re.sub(r'([A-Z]+\([0-9]+),([0-9]+\))', r'\1|\2', yourstr)

It's similar to the expression you showed in your question. Essentially, we split the entire capture NUMBER(38,0) into two capture groups: NUMBER(38 and 0), separated by a comma, which we replace by a |.


If you wanted to replace an arbitrary number of commas within parentheses, then you'd probably want to use a lambda replace instead:

newstr = re.sub(r'[A-Z]+\((?:[0-9]+,)*[0-9]+\)', 
                lambda m: m.group(0).replace(',', '|'), 
                yourstr)

which just looks for an entire token like NUMBER(38,5,6,0) and replaces , with | only inside that token.

1 Comment

PERFECT! Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.