20

I have a large table with phone numbers. The phone numbers are all strings and supposed to be '+9628789878' or similar. (a "+" sign followed by between 9 and 13 digits.)

A user bug uncovered one row with the string '+987+9873678298'. Clearly it shouldn't be there and I'd like to find out how many other cases there are of this or other such errors.

I tried this query but it's not doing the job. My thinking is anything that's not like this string. (Oh, the table is not indexed by phone_number.)

SELECT user_key,
       first_name,
       last_name,
       phone_number
FROM   users u
WHERE  regexp_like(phone_number, '[^\+[0-9]*]')
AND    phone_number IS NOT NULL
1
  • 4
    Unrelated, but: phone_number IS NOT NULL is unnecessary because NULL values won't make it through regexp_like anyway. Commented Mar 1, 2017 at 15:54

2 Answers 2

44

If you need to find all the rows where phone_number is not made by exactly a '+' followed by 9-13 digits, this should do the work:

select *
from users 
where not regexp_like(phone_number, '^\+[0-9]{9,13}$')

What it does:

  • ^ the beginning of the string, to avoid things like 'XX +123456789'
  • \+ the '+'
  • [0-9]{9,13} a sequence of 9-13 digits
  • $ the end of the string, to avoid strings like '+123456789 XX'

Another way, with no regexp, could be the following:

where not (
                /* strings of 10-14 chars */
                length(phone_number) between 10 and 14 
                /* ... whose first is a + */
            and substr(phone_number, 1, 1 ) = '+' 
                /* ...and that become a '+' after removing all the digits */
            and nvl(translate(phone_number, 'X0123456789', 'X'), '+') = '+' 
          )

This could be faster than the regexp approach, even if it's based on more conditions, but I believe only a test will tell you which one is the best performing.

Sign up to request clarification or add additional context in comments.

Comments

1

you can work with regexp_substr instead of regexp_like

To find rows that respect the pattern :

SELECT *
FROM users 
WHERE regexp_substr(phone_number, '^\+[0-9]{9,13}$') is not null

To find other rows that doesn't respect the pattern :

SELECT *
FROM users 
WHERE regexp_substr(phone_number, '^\+[0-9]{9,13}$') is null 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.