2

I have this code (was already there, isn't mine):

SELECT
    a.id_original_contrato AS contrato,
    ( CASE WHEN d.value~'^\\d+$' THEN d.value::integer ELSE 0 END ) AS monto,
    EXTRACT(YEAR FROM b.value)::integer AS anoinicio,
    EXTRACT(YEAR FROM c.value)::integer AS anofin

... etc (some JOIN's and WHERE's)

Let me explain: d.value comes from a table where value is character varying (200). The code will insert later the d.value (now called 'monto') in another table as a integer. Someone coded that regex in order to extract some chars or in other case (ELSE), define it as 0. Those values works when they are integer only. If I use a d.value like 76.44 it doesn't work due that regex, it always define it as 0.

Well, I have to change that code, because:

  • I need to store the d.value in the new table as numeric, not as integer anymore (In my new table the data type is numeric now)
  • But first, I need to correct that regex, because is messing my numeric numbers like 76.44, or 66,56 (dot or coma).

I'm not sure what that regex is doing. And how could I accomplish what need with a better or new regex?

0

2 Answers 2

5

The double backslash in \\d suggests an old version with standard_conforming_strings = off. The manual:

Beginning in PostgreSQL 9.1, the default is on (prior releases defaulted to off).

In modern versions with standard_conforming_strings = on, this string makes little sense as regular expression: '^\\d+$'. To detect strings consisting of one or more digits use either E'^\\d+$' (prefixed with E) or '^\d+$'. Details:

Integer literals also allow an optional leading sign for negative / positive numbers, and leading / dangling white space in Postgres.
So, this is the complete regular expression for valid integer literals:

CASE WHEN d.value ~ '^\s*[-+]?\d+\s*$' THEN d.value::int ELSE 0 END

The regular expression explained:

^ .. start of string
\s .. class shorthand for [[:space:]] (white space)
* .. quantifier for 0 or more times
[+-] .. character class consisting of + and -
? .. quantifier for 0 or 1 times
\d .. class shorthand for [[:digit:]] (digits)
+ .. quantifier for 1 or more times
\s* .. same as above
$ .. end of string

Consider the syntax rules for numeric string literals. One essential quote:

There cannot be any spaces or other characters embedded in the constant

That's because a numeric constant is not quoted, hence white space is not possible. Not applicable for casting strings. White space is tolerated:

Leading, trailing and right after the exponent char.

So these are all legal strings for the cast to numeric:

'^\s*[-+]?\d*\.?\d+(?:[eE]\s*[-+]?\d+)?\s*$'

The only new element are parentheses (()) to denote the contained regular expression as atom. Since we are not interested in back references, use "non-capturing": (?:...) and append a question mark (?:[eE]\s*[-+]?\d+)? to mean: the "exponential" part can be added or not, as a whole.

Assuming dot (.) as decimal separator. You might use comma instead (,) or [,\.] to allow either. But only dot is legal for the cast.

Test:

SELECT '|' || lit || '|' AS text_with_delim
     , lit ~ '^\s*[-+]?\d*\.?\d+([eE]\s*[-+]?\d+)?\s*$' AS valid
     , lit::numeric AS number
FROM   unnest ('{1
               , 123
               , 000
               , "  -1     "
               , +2
               , 1.2
               , .34
               , 5e6
               , " .5e   -6  "
                }'::text[]) lit;

Result:

text_with_delim valid number
|1| t 1
|123| t 123
|000| t 0
| -1 | t -1
|+2| t 2
|1.2| t 1.2
|.34| t 0.34
|5e6| t 5000000
| .5e -6 | t 0.0000005

Or you might have use for to_number() to convert strings of arbitrary given format.

Sign up to request clarification or add additional context in comments.

4 Comments

I see, hmm. The coded is made to work in 9.2, and now it will run in 9.4. I'm still reading and trying to get your answer. Thanks (+1)
Great, only by using the '^\s*[-+]?\d*\.?\d+([eE]\s*[-+]?\d+)?\s*$' regex I can make it works! My Postgresql is set to US in postgresql.conf like this: lc_numeric = 'en_US.UTF-8' so by now I can only use . as decimal separator, but in production enviroment is set as lc_numeric = 'es_ES.UTF-8' so i think I won't have problems there. I'll keep reading and doing all what you have written here Erwin, thanks you.
@pmirnd: lc_numeric influences the behavior of functions like to_char(), but the the type cast from text to numeric does not depend on locale settings. That would be madness. So, a comma is never legal in a cast to numeric. You'll have to replace it with replace() or translate(). Or use to_number().
Finally, what I did and now works ok is: select cast(replace(d.value, ',', '.') as numeric) AS monto . ta-da!
2

Choose a variant:

with v(value) as (
    values
    ('12,3'),
    ('12.3'),
    ('123'),
    ('123.'),
    ('.123'),
    ('1.2.3')
    )

select 
    value, 
    value ~ '^(\d+[,\.]\d+|\d+)$' as variant_a,
    value ~ '^(\d*[,\.]\d*|\d+)$' as variant_b,
    value ~ '^\d+[,\.]\d+$' as variant_c
from v;

 value | variant_a | variant_b | variant_c 
-------+-----------+-----------+-----------
 12,3  | t         | t         | t
 12.3  | t         | t         | t
 123   | t         | t         | f
 123.  | f         | t         | f
 .123  | f         | t         | f
 1.2.3 | f         | f         | f
(6 rows)

To convert a string with a dot or a comma to numeric use replace():

select replace(value, ',', '.')::numeric;   

1 Comment

Thanks mate, I put a +1 to this because I learnt something that I didn't knew.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.