0

I have following table in PostgreSQL 11.0

col1     col2
1        L01XC Monoclonal antibodies
2        S01FB
3        A01AC | C05AA | D07AB | D10AA | H02AB | R01AD | R03BA | R07AX | S01BA | S02BA | S03BA
4        A01AC Corticosteroids for local oral treatment; H02AB Glucocorticoids

I have to fetch substrings such that I only get the strings with letters and numbers (concatenate by '|' if multiple existence of codes in a string like ROW 4). Below is the desired output.

col1    col2
1       L01XC
2       S01FB
3       A01AC | C05AA | D07AB | D10AA | H02AB | R01AD | R03BA | R07AX | S01BA | S02BA | S03BA
4       A01AC | H02AB 

I have tried following query:

 select distinct  
        regexp_matches(col2, '(?:[A-Z]+\d|\d+[A-Z])[A-Z0-9]*','g') as col2
      from tbl

1 Answer 1

1

You have to groups the results of regexp:

select col1, string_agg(v[1], '|') 
 from tbl, 
      regexp_matches(col2, '(?:[A-Z]+\d|\d+[A-Z])[A-Z0-9]*','g') r(v) 
 group by col1 
 order by col1;

┌──────┬───────────────────────────────────────────────────────┐
│ col1 │                      string_agg                       │
╞══════╪═══════════════════════════════════════════════════════╡
│    1 │ L01XC                                                 │
│    2 │ S01FB                                                 │
│    3 │ A01AC|C05AA|D07AB|D10AA|H02AB|R01AD|R03BA|R07AX|S01BA │
│    4 │ A01AC|H02AB                                           │
└──────┴───────────────────────────────────────────────────────┘
(4 rows)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.