1

Lets say I have a string-

"a   b      cc  d d  e"

I'd like to be able to do

regexp_substr('a   b      cc  d d  e', ??, 1, 4)

And have it return "d d".

I was using "[^ ]+" as my pattern, but it splits single spaces which I don't want.

1
  • This is not clear. What are you trying to return? The fourth occurrence of (one or more spaces), plus the full blocks of non-space characters on both sides of it? Commented Nov 10, 2016 at 20:50

3 Answers 3

1

I think this will do what you want:

select regexp_substr(col, '([^ ]*[ ]?[^ ]+)+', 1, 4)
Sign up to request clarification or add additional context in comments.

3 Comments

This solution would produce the result 'b ' if the argument 4 was replaced with 2 (that is, the "second occurrence" of something is 'b '. It is hard to understand how that agrees with the problem statement.
... and yet the answer is accepted, so I am surely missing something! :-)
regexp_substr(col, '\S( ?\S)*', 1, 4) would be more correct
0

Assuming you are looking for the fourth occurrence of one or more consecutive spaces, and you want to return that PLUS the surrounding non-space characters (and allowing for one or more spaces at the beginning or the end of the string, in which case the result will not have non-space characters on that side):

with
     inputs ( str, occ ) as (
       select 'a   b      cc  d d  e', 4 from dual
     )
select regexp_substr(str, '([^ ]+|^)[ ]+', 1, occ) || 
          regexp_substr(str,'[ ]+([^ ]+|$)', 1, occ, null, 1) as token
from inputs
;

Specifically, this will produce the following output for values 1, 2, 3, 4, 5 of occ:

'a b', 'b cc', 'cc d', 'd d', 'd e'.

Note that as soon as one match is found, the search for the next match starts at the character after the match that has already been found. In my interpretation of the problem, the non-space characters are included in overlapping candidate matches, so it is not possible (I don't think) to do this in a single pass of regexp_substr.

I input the string and the occurrence (4) in the CTE at the top; that is part of setting up test data, NOT part of the solution.

Comments

0

This one will do the trick:

select regexp_substr(str, '([^ ]([ ]?[^ ])*)',1, 4)

If you want to parse out all the discreet values you could use this:

with sample_data as (
  select 'a   b      cc  d d  e' str from dual union all
  select ' a   b      cc  d dd d  e' str from dual
)
, recurse(str, seq, nxt, tok) as (
select str
     , 1
     , regexp_instr (str, '[ ]{2,}',1,1,1)
     , regexp_substr(str, '([^ ]([ ]?[^ ])*)',1, 1)
 from sample_data
union all
select str
     , seq+1
     , regexp_instr(str, '[ ]{2,}',nxt,1,1)
     , regexp_substr(str, '([^ ]([ ]?[^ ])*)',nxt, 1)
 from recurse
 where nxt > 0
)
select * from recurse order by str, seq;

STR                              SEQ        NXT TOK                     
------------------------- ---------- ---------- -------------------------
 a   b      cc  d dd d  e          1          6 a                        
 a   b      cc  d dd d  e          2         13 b                        
 a   b      cc  d dd d  e          3         17 cc                       
 a   b      cc  d dd d  e          4         25 d dd d                   
 a   b      cc  d dd d  e          5          0 e                        

a   b      cc  d d  e              1          5 a                        
a   b      cc  d d  e              2         12 b                        
a   b      cc  d d  e              3         16 cc                       
a   b      cc  d d  e              4         21 d d                      
a   b      cc  d d  e              5          0 e                        

 10 rows selected 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.