2

I have a query that requires strings from tables to be stripped of special characters before they are compared against each other. I created a function that takes in a string and removes certain special characters from the string before returning it. The problem is I found myself using the function many times due to the query doing a lot of comparisons. This significantly slowed down the performance after adding the functionality.

So I have this function I created:

create or replace FUNCTION F_REMOVE_SPECIAL_CHARACTERS 
(
  IN_PARAM_EMAIL_NAME IN VARCHAR2,
  IN_PARAM_NUMBER_FLAG IN VARCHAR2 DEFAULT 'N'
) RETURN VARCHAR2 AS 
BEGIN
  /* If flag is Y then remove all numbers too. Otherwise, keep numbers in the string */
  IF IN_PARAM_NUMBER_FLAG = 'Y' THEN
    RETURN replace(regexp_replace(IN_PARAM_EMAIL_NAME, '[-,._0-9]', ''), ' ', '');
  ELSE
    RETURN replace(regexp_replace(IN_PARAM_EMAIL_NAME, '[-,._]', ''), ' ', '');
  END IF;
END F_REMOVE_SPECIAL_CHARACTERS;

I also have a query that goes like this:

SELECT a.ID, LISTAGG(b.BUSINESS_EMAIL) WITHIN GROUP (ORDER BY a.ID)
FROM tableA a, tableB b
WHERE UPPER(F_REMOVE_SPECIAL_CHARACTERS(b.LAST_NAME)) IN (
  (SELECT UPPER(F_REMOVE_SPECIAL_CHARACTERS(a.NICK_NAME)) FROM tableC c
      WHERE UPPER(F_REMOVE_SPECIAL_CHARACTERS(a.NICK_NAME)) IN (
        (SELECT UPPER(F_REMOVE_SPECIAL_CHARACTERS(c.NAME)) FROM tableC c
           WHERE UPPER(F_REMOVE_SPECIAL_CHARACTERS(a.NICK_NAME)) = UPPER(F_REMOVE_SPECIAL_CHARACTERS(a.LAST_NAME))
           )
      )
  )
)

The actual query is bigger and more complicated but the point is that I need to remove special characters from certain column values which happens to be repeated multiple times in the query. This means I need to use the function multiple times but this causes significant slowdown in performance.

Does anyone have an idea on how to reduce the performance slowdown when using multiple function calls in a query? Thanks.

6
  • Add columns to your table and pre-compute all the stripped strings. Then you can do normal SQL operations on the pre-computed values. It's a big one time cost to pre-compute all the column, but future queries will be much faster. Commented Apr 1, 2019 at 15:03
  • Thanks for your suggestion. I'll definitely look into this solution Commented Apr 1, 2019 at 15:09
  • So that we have a clear PROBLEM statement (separate from the SOLUTION you currently have): You are given a (possibly long) string (is it always VARCHAR2 or can it be CLOB?), a list of characters to remove: dash, comma, period, underscore and space, and an input parameter that tells you whether digits should also be removed. Correct? If so, you don't need to write a function, and you don't need regular expressions. And, in your current solution, you can include space in the [ ... ] of the inner function, no need for extra steps. Commented Apr 1, 2019 at 15:11
  • @mathguy, Yes that is correct. All inputs to the function are a varchar2. I created the function so it could be configurable since I would be stripping strings multiple times. Do you have suggestion I could do to improve performance? If I don't use regex, will that increase performance? Commented Apr 1, 2019 at 15:16
  • I posted a full answer. To your last question - if you avoid regexp, it is very likely that performance will be improved, perhaps significantly - but it depends also on what was making the query slow. (For example, if the many context switches between SQL and PL/SQL were the bottleneck, regexp vs. not won't matter that much; in that case, other things - like pragma udf in Oracle 12.1 or later - will give the biggest improvement). Commented Apr 1, 2019 at 15:27

2 Answers 2

3

Assuming you need this as a function (because you use it in many places), you could clean it up and simplify it (and make it more efficient) like so:

create or replace function f_remove_special_characters 
(
  in_param_email_name in varchar2,
  in_param_number_flag in varchar2 default 'N'
) 
return varchar2
deterministic
as 
pragma udf;   -- if on Oracle 12.1 or higher, and function is only for SQL use
  /* If flag is Y then remove all numbers too. 
     Otherwise, keep numbers in the string
  */
  chars_to_remove varchar2(16) := 'z-,._ ' || 
                         case in_param_number_flag when 'Y' then '0123456789' end;
begin
  return translate(in_param_email_name, chars_to_remove, 'z');
end f_remove_special_characters;
/

The silly trick with the 'z' in translate (in the second and third arguments) is due to Oracle's odd treatment of null. In translate, if any of the arguments is null the result is null, in contrast with Oracle's treatment of null in other string operations.

Sign up to request clarification or add additional context in comments.

Comments

2

If you are on 12c or above, then as a quick fix you can use WITH FUNCTION clause

As I remember this eliminates PL/SQL<->SQL Context switches so you query shoul perform better.

I've never tested that, but it is very likely that it will be faster even 30-50 times. Let me know how fast it will be, because I'm curious

WITH FUNCTION F_REMOVE_SPECIAL_CHARACTERS 
(
  IN_PARAM_EMAIL_NAME IN VARCHAR2,
  IN_PARAM_NUMBER_FLAG IN VARCHAR2 DEFAULT 'N'
) RETURN VARCHAR2 AS 
BEGIN
  /* If flag is Y then remove all numbers too. Otherwise, keep numbers in the string */
  IF IN_PARAM_NUMBER_FLAG = 'Y' THEN
    RETURN replace(regexp_replace(IN_PARAM_EMAIL_NAME, '[-,._0-9]', ''), ' ', '');
  ELSE
    RETURN replace(regexp_replace(IN_PARAM_EMAIL_NAME, '[-,._]', ''), ' ', '');
  END IF;
END;
SELECT a.ID, LISTAGG(b.BUSINESS_EMAIL) WITHIN GROUP (ORDER BY a.ID)
FROM tableA a, tableB b
WHERE UPPER(F_REMOVE_SPECIAL_CHARACTERS(b.LAST_NAME)) IN (
  (SELECT UPPER(F_REMOVE_SPECIAL_CHARACTERS(a.NICK_NAME)) FROM tableC c
      WHERE UPPER(F_REMOVE_SPECIAL_CHARACTERS(a.NICK_NAME)) IN (
        (SELECT UPPER(F_REMOVE_SPECIAL_CHARACTERS(c.NAME)) FROM tableC c
           WHERE UPPER(F_REMOVE_SPECIAL_CHARACTERS(a.NICK_NAME)) = UPPER(F_REMOVE_SPECIAL_CHARACTERS(a.LAST_NAME))
           )
      )
  )
)

2 Comments

Thanks. I will try this and I'll comment back with results.
Embedding functions in a WITH clause is indeed much faster than calling a traditional PL/SQL function (although 30-50 times faster is probably the exception, not the norm). In any case, Oracle 12.1 also brought pragma udf, declared in a PL/SQL function, which is actually even faster than declaring functions in a WITH clause.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.