0

I am trying to add a comma and a space to my strings. Here are some examples:

{ELEPHANT:1, FENNEC_FOX:1NAKED_MOLE:1URCHIN:2}

{DUNG_BEETLE:12URCHIN:1}

{DUNG_BEETLE:1FENNEC_FOX:1URCHIN:2}

Notice the inconsistent lack of ", ". I would like to outcome to be

{ELEPHANT:1, FENNEC_FOX:1, NAKED_MOLE:1, URCHIN:2}

{DUNG_BEETLE:1, URCHIN:1}

{DUNG_BEETLE:1, FENNEC_FOX:1, URCHIN:2}

I think I need to use REGEXP_REPLACE(my_string, ':[0-9]+[a-z_A-Z]', replacement). But I'm not quite sure how to make replacement be the colon and whatever the number is, a comma and a space, and whatever the matching letter is.

0

2 Answers 2

2

This expression gets your desired result (fastest in a quick test with 100k rows):

regexp_replace(my_string, '(:\d+)(?=[a-zA-Z])', '\1, ', 'g')

Core features are the positive lookahead (?=[a-zA-Z]) and the 4th parameter 'g'.
Working with a positive lookbehind is slower for this - as hinted by Nick.

regexp_replace(my_string, '(?<=:\d+)([a-zA-Z])', ', \1', 'g')

fiddle

Related answer with more explanation:

Sign up to request clarification or add additional context in comments.

8 Comments

Using a (variable length) lookbehind is far less efficient than just matching the : and digits (and including them in the replacement) as it requires checking every letter in the string to be matched to see if it is preceded by that pattern.
See for example regex101.com/r/0HQMke/1 which takes almost 3 times as many steps as regex101.com/r/LQJVON/1
@Nick: Thanks for pointing out. A quick test confirmed that your variant is ~ twice as fast. I added a version with a positive lookahead, that's even faster than that. BTW, [a-zA-Z] is faster than \w (and correct for the task) because the latter includes digits (and more).
\w is fine here because OP's regex includes _, and \d+ being greedy will consume all digits before attempting to match \w.
@Nick: I am not saying it's wrong. It's a bit slower. My note " (and correct for the task)" is meant to point out that [a-zA-Z] is not wrong.
|
2

Your regex is fine, although you can simplify it by using \d in place of [0-9] and \w in place of [a-z_A-Z]. You then need to use capturing groups to save the matched text and insert it into the replacement string:

SELECT REGEXP_REPLACE(my_string, '(:\d+)(\w)', '\1, \2', 'g')
FROM my_table

Output:

{ELEPHANT:1, FENNEC_FOX:1, NAKED_MOLE:1, URCHIN:2}
{DUNG_BEETLE:12, URCHIN:1}
{DUNG_BEETLE:1, FENNEC_FOX:1, URCHIN:2}

If you're trying to convert this into valid JSON, you'll need a second step to enclose the keys in double quotes:

SELECT REGEXP_REPLACE(REGEXP_REPLACE(my_string, '(:\d+)(\w)', '\1, \2', 'g'), '(\w+):', '"\1":', 'g')
FROM my_table

Output:

{"ELEPHANT":1, "FENNEC_FOX":1, "NAKED_MOLE":1, "URCHIN":2}
{"DUNG_BEETLE":12, "URCHIN":1}
{"DUNG_BEETLE":1, "FENNEC_FOX":1, "URCHIN":2}

Demo on dbfiddle.uk

Note that although \w includes digits as well as letters and _, it's OK to use it in this regex because the greedy \d+ will consume all digits before attempting to match \w.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.