0

I need the capture the text between Detl Code Desc: and Ftyp Code Desc:

see the example below:

Detl Code Desc: CPS Leadership PD    Ftyp Code Desc: Flat Fee
Detl Code Desc: CPS Professional Develop. ED    Ftyp Code Desc

What would be the best way to do it? I tried regexp_substr but couldn't get it to work.

Thanks for the help.

4
  • 2
    What if one or both of the two strings appears multiple times or doesn't appear at all? Commented Sep 13, 2024 at 17:09
  • 1
    You should be able to do it using REGEXP_SUBSTR() with a capture group. See the 4th example in the documentation Commented Sep 13, 2024 at 17:09
  • Please show what you tried and what errors were returned. Commented Sep 13, 2024 at 17:15
  • Hi, Barmar, that's not a fixed pattern though, the text between the string in between change from row to row. I only gave two examples here, but there are more variety. Commented Sep 13, 2024 at 17:17

2 Answers 2

0

Here's an example that uses a Common Table Expression (CTE) to set up some test data. It's like creating a test table inline and is a great way to provide data and examples to the folks you're asking for help. Make sure to include unexpected values like NULLS and incomplete data. You didn't specify so I assumed (dangerous!) that you want the first occurrence of the pattern in the string, hence the anchoring to the start of the string. I have a capturing group around the data between the 2 known strings which is what is returned. Note REGEXP_SUBSTR() returns NULL if the pattern is not found (ID 4). You may have to handle this depending on your specs.

with tbl(id, str) as (
  select 1, 'Detl Code Desc: CPS Leadership PD    Ftyp Code Desc: Flat Fee' from dual union all
  select 2, 'Detl Code Desc: CPS Professional Develop. ED    Ftyp Code Desc' from dual union all
  select 3, '' from dual union all
  select 4, 'Detl Code Desc: CPS Leadership PD' from dual
)
select id, regexp_substr(str, '^Detl Code Desc: (.*?) *Ftyp Code Desc', 1, 1, null, 1) detail_code_desc
from tbl;

        ID DETAIL_CODE_DESC                                              
---------- --------------------------------------------------------------
         1 CPS Leadership PD                                             
         2 CPS Professional Develop. ED                                  
         3                                                               
         4                                                               

4 rows selected.

In contrast, REGEXP_REPLACE() returns the original string if the pattern is not matched. Here I put capturing groups around all components of the string and returned just the 2nd one. Perhaps comparing the original and returned strings will let you know how to proceed if your match is not found. That is up to your specs. One way may lend itself better than the other to your error handling.

with tbl(id, str) as (
  select 1, 'Detl Code Desc: CPS Leadership PD    Ftyp Code Desc: Flat Fee' from dual union all
  select 2, 'Detl Code Desc: CPS Professional Develop. ED    Ftyp Code Desc' from dual union all
  select 3, '' from dual union all
  select 4, 'Detl Code Desc: CPS Leadership PD' from dual
)
select id, regexp_replace(str, '^(Detl Code Desc: )(.*?)( *Ftyp Code Desc)', '\2') detail_code_desc
from tbl;


        ID DETAIL_CODE_DESC             
---------- -----------------------------
         1 CPS Leadership PD: Flat Fee  
         2 CPS Professional Develop. ED 
         3                              
         4 Detl Code Desc: CPS Leadershi

4 rows selected.

P.S. Always show your work and what failed and any error messages when posting, it will help us help you.

Sign up to request clarification or add additional context in comments.

Comments

0

Instead of regular expressions, you can use combination of substr and instr functions. Note that details matter; in sample data you posted, only the 1st string contains both values you're interested in (the 2nd lacks in colon sign at the end of "Ftyp Code Desc").

Sample data, thanks to Gary_W:

SQL> with tbl(id, str) as (
  2    select 1, 'Detl Code Desc: CPS Leadership PD    Ftyp Code Desc: Flat Fee' from dual union all
  3    select 2, 'Detl Code Desc: CPS Professional Develop. ED    Ftyp Code Desc' from dual union all
  4    select 3, '' from dual union all
  5    select 4, 'Detl Code Desc: CPS Leadership PD' from dual
  6  )

Query begins here:

  7  select id,
  8    substr(str, instr(str, 'Detl Code Desc:') + length('Detl Code Desc:') + 1,
  9                instr(str, 'Ftyp Code Desc:') - instr(str, 'Detl Code Desc:') - length('Detl Code Desc:') - 1
 10          ) as result
 11  from tbl
 12  order by id;

        ID RESULT
---------- ------------------------------
         1 CPS Leadership PD
         2
         3
         4

SQL>
  • only the 1st string returns something as it contains both starting and ending boundary
  • the 2nd one misses ending colon (as I already said) so it doesn't satisfy conditions you set
  • the 3rd is an empty string so it can't return anything
  • the 4th doesn't contain ending boundary

Difference between this approach and regular expressions? This one will probably behave better (read: faster) on large data sets.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.