0

I have data as follows in a column

+----------------------+
|      my_column       |
+----------------------+
| test_PC_xyz_blah     |
| test_PC_pqrs_bloh    |
| test_Mobile_pqrs_bleh|
+----------------------+

How can I extract the following as columns?

+----------+-------+
| Platform | Value |
+----------+-------+
|   PC     |  xyz  |
|   PC     |  pqrs |
|  Mobile  |  pqrs |
+----------+-------+

I tried using REGEXP_SUBSTR

Default first pattern occurrence for platform:

select regexp_substr(my_column, 'test_(.*)_(.*)_(.*)') as platform from table

Getting second pattern occurrence for value:

select regexp_substr(my_column, 'test_(.*)_(.*)_(.*)', 1, 2) as value from table

This isn't working, however. Where am I going wrong?

2 Answers 2

2

For Non-empty tokens

select  regexp_substr(my_column,'[^_]+',1,2)    as platform
       ,regexp_substr(my_column,'[^_]+',1,3)    as value

from    my_table
;

For possibly empty tokens

select  regexp_substr(my_column,'^.*?_(.*)?_.*?_.*$',1,1,'',1)  as platform
       ,regexp_substr(my_column,'^.*?_.*?_(.*)?_.*$',1,1,'',1)  as value

from    my_table
;

+----------+-------+
| PLATFORM | VALUE |
+----------+-------+
| PC       | xyz   |
+----------+-------+
| PC       | pqrs  |
+----------+-------+
| Mobile   | pqrs  |
+----------+-------+
Sign up to request clarification or add additional context in comments.

Comments

0

(.*) is greedy by nature, it will match all character including _ character as well, so test_(.*) will match whole of your string. Hence further groups in pattern _(.*)_(.*) have nothing to match, whole regex fails. The trick is to match all characters excluding _. This can be done by defining a group ([^_]+). This group defines a negative character set and it will match to any character except for _ . If you have better pattern, you can use them like [A-Za-z] or [:alphanum]. Once you slice your string to multiple sub strings separated by _, then just select 2nd and 3rd group.

ex:

SELECT REGEXP_SUBSTR( my_column,'(([^_]+))',1,2) as platform, REGEXP_SUBSTR( my_column,'(([^_]+))',1,3) as value from table;

Note: AFAIK there is no straight forward method to Oracle to exact matching groups. You can use regexp_replace for this purpose, but it unlike capabilities of other programming language where you can exact just group 2 and group 3. See this link for example.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.