0

I have a redshift database that I need to query, and need to group similar strings together. I'm using regexp_replace() to do this, but can't work out how to get it grouping strings that have integers in the middle. For example:

dataset:

string
'aaa/123/bbb'
'aaa/456/bbb'
'ccc/123/ddd'

I need to group this so that we'd get

string     count(*)
aaa/id/bbb 2
ccc/id/ddd 1

So I have tried using

regexp_replace(endpoint, '/[0-9]+$/', '/id/')

But it doesn't work, I'm assuming because there is no wildcard or something? But I can't work out how to fix this.

Thanks in advance

1 Answer 1

1

I know that you also want to replace the numbers at the end. This comes close to what you want:

select regexp_replace(endpoint, '/[0-9]+(/|$)', '/id/')
from (select 'aaa/123/bbb' as endpoint union all
      select 'aaa/123' as endpoint 
      ) x

But it returns a slash at the end in the second case.

If you have no other intermediate values that start with numbers, then this does what you want:

select regexp_replace(endpoint, '/[0-9]+', '/id')
from (select 'aaa/123/bbb' as endpoint union all
      select 'aaa/123' as endpoint 
      ) x

Otherwise, two calls to regexp_replace() does the trick:

select regexp_replace(regexp_replace(endpoint, '/[0-9]+/', '/id/'), '/[0-9]$', '/id')
from (select 'aaa/123/bbb' as endpoint union all
      select 'aaa/123' as endpoint 
      ) x;
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.