0

enter image description here

Hi All,

I have a query related to Regular expressions in SQL.

I have a case where a portion of string has to be extracted from a column. The portion of that column will be prefixed with my column A. Please see the screenshot for the sample data. I have also added the output expected in a separate column (highlighted in green).

Scenarios:

  1. Now if a column value has more than 1 unique number then that has to be shown up with Null Eg: To verify CAN06010025, CAN06010026 & CAN06010030 after the approval.

In the above string I have more than 1 number(bold portion) and this case should be ignored (meaning it has to give me Null Value).

  1. If there is only one number and if it is repetitive then I have to consider that case and extract the portion of String.. Eg: Project USA12: Id USA12S001: Contact required -USA12S001- form to be updated

In this example, the portion I wanted to extract is repetitive and I am looking to extract the highlighted portion alone.

The same applies to the other cases as well.

I tried with the below sql. The challenge is my Col A can also be present in Col B (Line 2 in screenshot) and this code is considering my Col A portion when I count with REGEXP_COUNT function and is giving me the value as Null. My expectation is to extract that USA12S001 portion from the column.

Could you please help in achieving this where the above two conditions satisfies.

SQL:

SELECT
   ColA,
   ColB,
   case when REGEXP_COUNT(ColB,ColA) >2 THEN NULL 
   ELSE REPLACE(REPLACE(concat(regexp_substr(ColB,ColA||'([[:alnum:]]+\.?)'),
    nvl(regexp_substr(ColB,ColA||'(\-[[:digit:]]+)'),
   regexp_substr(ColB,ColA||'([[:space:]]\-[[:space:]][[:digit:]]+)'))),
               ' ',''),'.','') 
    END AS Result  
FROM
   table

Test Data:

Col A

CAN06

USA12

USA27

HUN04

CAN05

USA24

CAN06


Col B

to verify CAN06010025, CAN06010026 & CAN06010030 after the approval

Project USA12: Id USA12S001: Contact required -USA12S001- form to be updated

Project USA27: Id: USA27S001: Prod

To review id HUN04S002-HUN04S004 after the due date.

ID: CAN05S005 with the details as CAN05S005 are completed.

Project USA24: Id: USA24S009: Data Issue

"Project: Subject CAN06S009: V2 & V3- Id CAN06S010: V1"

2
  • Thanks for looking into it Tim, I am using Oracle to implement this. Commented Sep 20, 2018 at 14:25
  • Please post the test data as text, not image - I can't copy your image and paste it into my SQL editor. Commented Sep 20, 2018 at 14:32

1 Answer 1

1

If the REGEXP_COUNT is the only issue, then the answer is simple: change

case when REGEXP_COUNT(ColB,ColA) >2

to:

case when REGEXP_COUNT(ColB,ColA || '[[:alnum:]]') >2
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.