Regex splitting works neatly on a single string. The snag is the usual approach spawns a cartesian product on multiple rows i.e. when used on a table. My query nicks a clever solution from Alex Nuitjen.
To break it down: the first two sub-queries tokenize the cols, the third sub-query re-aggregates them in alphabetical order, and the main query evaluates them for duplication:
with col1 as (
select id, col1, regexp_substr(col1,'[^ ]+', 1, rn) as tkn
from t42
cross join (select rownum rn
from (select max ( regexp_count(col1,' ')+1) + 1 mx from t42)
connect by level <= mx
)
where regexp_substr(col1,'[^ ]+', 1, rn) is not null
order by id
)
, col2 as (
select id, col2, regexp_substr(col2,'[^ ]+', 1, rn) as tkn
from t42
cross join (select rownum rn
from (select max ( regexp_count(col2,' ')+1) + 1 mx from t42)
connect by level <= mx
)
where regexp_substr(col2,'[^ ]+', 1, rn) is not null
order by id
)
, ccat as (
select col1.id
, col1.col1
, listagg(col1.tkn, ' ') within group (order by col1.tkn) as catcol1
, col2.col2
, listagg(col2.tkn, ' ') within group (order by col2.tkn) as catcol2
from col1
join col2 on col1.id = col2.id
group by col1.id, col1.col1, col2.col2 )
select ccat.id
, ccat.col1
, ccat.col2
, case when ccat.catcol1=ccat.catcol2 then 'Y' else 'N' end as duplicate
from ccat
order by ccat.id
/
I assume you have a key column (ID in my code).
Although this solution is more verbose than the one proposed by @shaileshyadav it does have the advantage of scaling for any number of tokens. Given this test data ...
SQL> select * from t42
2 /
ID COL1 COL2
---------- ----------------------- -----------------------
1 ABC DEF DEF ABC
2 ABC DEF GHI ABC
3 ABCD EFGH IJKL MNOP IJKL MNOP ABCD EFGH
4 ABCD EFGH IJKL MNOP IJKL QRST EFGH ABCD
5 ABC ABC DEF DEF ABC DEF
6 AAA BBB CCC DDD EEE AAA BBB CCC DDD
7 AAA BBB CCC DDD EEE AAA BBB CCC DDD EEE
8 XXX YYYY ZZZ AAA BBB AAA BBB XXX ZZZ YYYY
9 A B C D E F G H I J K L L K J I H G F E D C B A
10 AA BB CC DD EE AA BB CC DD FF
10 rows selected.
SQL>
... the query output is :
ID COL1 COL2 D
---------- ----------------------- ----------------------- -
1 ABC DEF DEF ABC Y
2 ABC DEF GHI ABC N
3 ABCD EFGH IJKL MNOP IJKL MNOP ABCD EFGH Y
4 ABCD EFGH IJKL MNOP IJKL QRST EFGH ABCD N
5 ABC ABC DEF DEF ABC DEF N
6 AAA BBB CCC DDD EEE AAA BBB CCC DDD N
7 AAA BBB CCC DDD EEE AAA BBB CCC DDD EEE Y
8 XXX YYYY ZZZ AAA BBB AAA BBB XXX ZZZ YYYY Y
9 A B C D E F G H I J K L L K J I H G F E D C B A Y
10 AA BB CC DD EE AA BB CC DD FF N
10 rows selected.
SQL>