0

I have following column in a postgres table.

col1                                start    end
p.[A138T;S160G;D221_E222delinsGK]   138      138
p.[A138T;S160G;D221_E222delinsGK]   160      160
p.[A138T;S160G;D221_E222delinsGK]   221      222

I am looking for a way to split the ; separated values in multiple rows. The expected output is:

col1                                start    end
p.A138T                             138      138
p.S160G                             160      160
p.D221_E222delinsGK                 221      222

I am using following query but it does not works for the third row.

select
       case 
            when start = "end" and col1 like 'p.[%' then 'p.'||(regexp_match(col1, '([A-Z]'||start||'[A-Z])'))[1] 
            when start != "end" and col1 like 'p.[%' then 'p.'||(regexp_match(col1, '[A-Z\d+_delins]+'||start||'[A-Z\d+_delins]+'))[1] 
            else col1,
                   
            start,
            end
from table

Any help is highly appreciated!

0

1 Answer 1

1

I prefer substring() to extract strings based on a regex, so I don't have to deal with the array that regexp_match() returns.

select case 
          when col1 like 'p.%' and start = "end" 
            then 'p.'||substring(col1 from concat('[A-Z]', start, '[A-Z]'))
          when col1 like 'p.%' and start <> "end" 
            then 'p.'||substring(col1 from concat('[A-Z]', start, '_[A-Z]', "end", 'delins[^\]]+'))
          else col1
       end as col1, 
       start,
       "end"
from the_table;

As far as I can tell, the regex for the second alternative adds the _delins incorrectly to the regex marking the "start" value. And if that is a constant value, it should not be part of a regex "range" because [delins]' would also match the string sindel. Not sure if you intended that.

Another option would be to simply take everything after the "end" marker and exclude the closing ] by using

substring(col1 from concat('[A-Z]', start, '_[A-Z]', "end", '[^\]]+'))
Sign up to request clarification or add additional context in comments.

2 Comments

What is the purpose of adding '%' here in 'p.%'?
@rshar: it's the wildcard for the LIKE operator. My understanding is, that this should only be applied if the string starts with p.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.