Oracle Regexp - substr by multiple spaces but not single space

Question

Lets say I have a string-

"a   b      cc  d d  e"

I'd like to be able to do

regexp_substr('a   b      cc  d d  e', ??, 1, 4)

And have it return "d d".

I was using "[^ ]+" as my pattern, but it splits single spaces which I don't want.

This is not clear. What are you trying to return? The fourth occurrence of (one or more spaces), plus the full blocks of non-space characters on both sides of it? — user5683823
– user5683823, Commented Nov 10, 2016 at 20:50

Gordon Linoff · Accepted Answer · 2016-11-10 20:56:23Z

1

I think this will do what you want:

select regexp_substr(col, '([^ ]*[ ]?[^ ]+)+', 1, 4)

answered Nov 10, 2016 at 20:56

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user5683823 Over a year ago

This solution would produce the result 'b ' if the argument 4 was replaced with 2 (that is, the "second occurrence" of something is 'b '. It is hard to understand how that agrees with the problem statement.

user5683823 Over a year ago

... and yet the answer is accepted, so I am surely missing something! :-)

Egor Skriptunoff Over a year ago

regexp_substr(col, '\S( ?\S)*', 1, 4) would be more correct

score 0 · Accepted Answer · 2016-11-10 21:12:30Z

Assuming you are looking for the fourth occurrence of one or more consecutive spaces, and you want to return that PLUS the surrounding non-space characters (and allowing for one or more spaces at the beginning or the end of the string, in which case the result will not have non-space characters on that side):

with
     inputs ( str, occ ) as (
       select 'a   b      cc  d d  e', 4 from dual
     )
select regexp_substr(str, '([^ ]+|^)[ ]+', 1, occ) || 
          regexp_substr(str,'[ ]+([^ ]+|$)', 1, occ, null, 1) as token
from inputs
;

Specifically, this will produce the following output for values 1, 2, 3, 4, 5 of occ:

'a b', 'b cc', 'cc d', 'd d', 'd e'.

Note that as soon as one match is found, the search for the next match starts at the character after the match that has already been found. In my interpretation of the problem, the non-space characters are included in overlapping candidate matches, so it is not possible (I don't think) to do this in a single pass of regexp_substr.

I input the string and the occurrence (4) in the CTE at the top; that is part of setting up test data, NOT part of the solution.

Sentinel · Accepted Answer · 2016-11-10 23:03:24Z

This one will do the trick:

select regexp_substr(str, '([^ ]([ ]?[^ ])*)',1, 4)

If you want to parse out all the discreet values you could use this:

with sample_data as (
  select 'a   b      cc  d d  e' str from dual union all
  select ' a   b      cc  d dd d  e' str from dual
)
, recurse(str, seq, nxt, tok) as (
select str
     , 1
     , regexp_instr (str, '[ ]{2,}',1,1,1)
     , regexp_substr(str, '([^ ]([ ]?[^ ])*)',1, 1)
 from sample_data
union all
select str
     , seq+1
     , regexp_instr(str, '[ ]{2,}',nxt,1,1)
     , regexp_substr(str, '([^ ]([ ]?[^ ])*)',nxt, 1)
 from recurse
 where nxt > 0
)
select * from recurse order by str, seq;

STR                              SEQ        NXT TOK                     
------------------------- ---------- ---------- -------------------------
 a   b      cc  d dd d  e          1          6 a                        
 a   b      cc  d dd d  e          2         13 b                        
 a   b      cc  d dd d  e          3         17 cc                       
 a   b      cc  d dd d  e          4         25 d dd d                   
 a   b      cc  d dd d  e          5          0 e                        

a   b      cc  d d  e              1          5 a                        
a   b      cc  d d  e              2         12 b                        
a   b      cc  d d  e              3         16 cc                       
a   b      cc  d d  e              4         21 d d                      
a   b      cc  d d  e              5          0 e                        

 10 rows selected

Collectives™ on Stack Overflow

Oracle Regexp - substr by multiple spaces but not single space

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related