Adding missing commas to strings matching a given pattern

Question

I am trying to add a comma and a space to my strings. Here are some examples:

{ELEPHANT:1, FENNEC_FOX:1NAKED_MOLE:1URCHIN:2}

{DUNG_BEETLE:12URCHIN:1}

{DUNG_BEETLE:1FENNEC_FOX:1URCHIN:2}

Notice the inconsistent lack of ", ". I would like to outcome to be

{ELEPHANT:1, FENNEC_FOX:1, NAKED_MOLE:1, URCHIN:2}

{DUNG_BEETLE:1, URCHIN:1}

{DUNG_BEETLE:1, FENNEC_FOX:1, URCHIN:2}

I think I need to use REGEXP_REPLACE(my_string, ':[0-9]+[a-z_A-Z]', replacement). But I'm not quite sure how to make replacement be the colon and whatever the number is, a comma and a space, and whatever the matching letter is.

Erwin Brandstetter · Accepted Answer · 2024-04-16 00:10:31Z

2

This expression gets your desired result (fastest in a quick test with 100k rows):

regexp_replace(my_string, '(:\d+)(?=[a-zA-Z])', '\1, ', 'g')

Core features are the positive lookahead (?=[a-zA-Z]) and the 4th parameter 'g'.
Working with a positive lookbehind is slower for this - as hinted by Nick.

regexp_replace(my_string, '(?<=:\d+)([a-zA-Z])', ', \1', 'g')

fiddle

8 Comments

Nick Over a year ago

Using a (variable length) lookbehind is far less efficient than just matching the : and digits (and including them in the replacement) as it requires checking every letter in the string to be matched to see if it is preceded by that pattern.

Nick Over a year ago

See for example regex101.com/r/0HQMke/1 which takes almost 3 times as many steps as regex101.com/r/LQJVON/1

Erwin Brandstetter Over a year ago

@Nick: Thanks for pointing out. A quick test confirmed that your variant is ~ twice as fast. I added a version with a positive lookahead, that's even faster than that. BTW, [a-zA-Z] is faster than \w (and correct for the task) because the latter includes digits (and more).

Nick Over a year ago

\w is fine here because OP's regex includes _, and \d+ being greedy will consume all digits before attempting to match \w.

Erwin Brandstetter Over a year ago

@Nick: I am not saying it's wrong. It's a bit slower. My note " (and correct for the task)" is meant to point out that [a-zA-Z] is not wrong.

|

Nick · Accepted Answer · 2024-04-16 01:34:22Z

Your regex is fine, although you can simplify it by using \d in place of [0-9] and \w in place of [a-z_A-Z]. You then need to use capturing groups to save the matched text and insert it into the replacement string:

SELECT REGEXP_REPLACE(my_string, '(:\d+)(\w)', '\1, \2', 'g')
FROM my_table

Output:

{ELEPHANT:1, FENNEC_FOX:1, NAKED_MOLE:1, URCHIN:2}
{DUNG_BEETLE:12, URCHIN:1}
{DUNG_BEETLE:1, FENNEC_FOX:1, URCHIN:2}

If you're trying to convert this into valid JSON, you'll need a second step to enclose the keys in double quotes:

SELECT REGEXP_REPLACE(REGEXP_REPLACE(my_string, '(:\d+)(\w)', '\1, \2', 'g'), '(\w+):', '"\1":', 'g')
FROM my_table

Output:

{"ELEPHANT":1, "FENNEC_FOX":1, "NAKED_MOLE":1, "URCHIN":2}
{"DUNG_BEETLE":12, "URCHIN":1}
{"DUNG_BEETLE":1, "FENNEC_FOX":1, "URCHIN":2}

Demo on dbfiddle.uk

Note that although \w includes digits as well as letters and _, it's OK to use it in this regex because the greedy \d+ will consume all digits before attempting to match \w.

Collectives™ on Stack Overflow

Adding missing commas to strings matching a given pattern

2 Answers 2

8 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

8 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related