1

I have this text:

Process explanation:The bottle is then melted to form liquid glass;Final activity for manager:Labeling of previous samples

I just want to get the part after 'Process explanation' but not include 'final activity...'

So like this:

The bottle is then melted to form liquid glass.

This is the current hive query which I want to convert to oracle:

SELECT REGEXP_EXTRACT(
               'Process explanation:The bottle is then melted to form liquid glass;Final activity for manager:Labeling of previous samples',
               '.*(process[ \t]*(explanation)?[ \t]*:[ \t]*)(.*?)([ \t]*;[ \t]*final[ \t]+activity[ \t]+for[ \t]+manager.*$|$)',
               3) as extracted
FROM my_table

2 Answers 2

2

If those substrings are just like you said, there's a pretty simple option - substr + instr functions.

SQL> with test (col) as
  2    (select 'Process explanation:The bottle is then melted to form liquid glass;Final activity for manager:Labeling of previous samples' from dual)
  3  select substr(col, instr(col, 'Process explanation') + length('Process explanation') + 1,
  4                     instr(col, 'Final activity') - instr(col, 'Process explanation') -
  5                       length('Process explanation') - 2
  6               ) result
  7  from test;

RESULT
----------------------------------------------
The bottle is then melted to form liquid glass

SQL>
Sign up to request clarification or add additional context in comments.

3 Comments

A big reason to prefer substr+instr to regexp is that they generally perform faster.
Sometimes the text are not exactly like in the question and I can't guarantee the sequence, the space or tab of the text. The process explanation by come before or after. Will this work in any case?
It won't, you'd have to switch those substrings in INSTR and LENGTH functions.
1

I've come up with something like this:

with strings as
(SELECT '1Process explanation:The bottle is then melted to form liquid glass;Final activity for manager:Labeling of previous samples' str FROM DUAL
union all
SELECT '2Process explanation:The bottle is then melted to form liquid glass;' str FROM DUAL
union all
SELECT '3Process :The bottle is then melted to form liquid glass' str FROM DUAL
union all
SELECT '4Process explanation: plasma gasification combined with centrifugal activity' str FROM DUAL
union all
SELECT '5Final activity for manager:Labeling of previous samples' str FROM DUAL
)
SELECT str
, REGEXP_SUBSTR(
               str,
           '(.*process[[:blank:]]*(explanation)?[[:blank:]]*:[[:blank:]]*)([A-Za-z0-9 ]*)([[:blank:]]*;[[:blank:]]*final[[:blank:]]*activity[[:blank:]]*for[[:blank:]]*manager.*$)?',
           1, 1, 'i',3)
                as extracted
FROM strings

Resulting in:

STR EXTRACTED
1Process explanation:The bottle is then melted to form liquid glass;Final activity for manager:Labeling of previous samples The bottle is then melted to form liquid glass
2Process explanation:The bottle is then melted to form liquid glass; The bottle is then melted to form liquid glass
3Process :The bottle is then melted to form liquid glass The bottle is then melted to form liquid glass
4Process explanation: plasma gasification combined with centrifugal activity plasma gasification combined with centrifugal activity
5Final activity for manager:Labeling of previous samples -

Assuming matching blank group instead of your space and tab list [ \t] is ok. Edit: Modified the regexp a bit cause with possibility of last group being empty '.*' kept catching entire line.

5 Comments

If I want to denote that this part may not exist: Final activity for manager:Labeling of previous samples how would I do that? I added a * at the end but it's not working: '(.*process[[:blank:]]*(explanation)?[[:blank:]]*:[[:blank:]]*)(.*)(;[[:blank:]]*final[[:blank:]]*activity[[:blank:]]*for[[:blank:]]*manager.*$)*'
for other types of strings this part may not exist: Final activity for manager:Labeling of previous samples and when I run the query on them it gave null. Example is Process explanation: plasma gasification combined with centrifugal activity
Ok, I've edited the regexp and added your examples to the test dataset.
also if there was a special character as part of this text: The bottle is then melted to form liquid glass and up to 50% alcohol is added how do I include special characters like the percentage sign? I tried this but apparently is wrong lol: '(.*process[[:blank:]]*(explanation)?[[:blank:]]*:[[:blank:]]*)([A-Za-z0-9-\\/:@\$%\[\]-`{}~]*)([[:blank:]]*final[[:blank:]]*activity[[:blank:]]*for[[:blank:]]*manager.*$)?'
basically The bottle is then melted to form liquid glass and up to 50% alcohol is added can contain any character, not just letters and numbers

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.