1

The following query:

select lower('ALGODÓN'), upper('algodón')

Results in:

  lower  |  upper
---------+---------
 algodÓn | ALGODóN
(1 row)

Python, on the other hand, gets this right:

>>> 'ALGODÓN'.lower()
'algodón'

Is there a way to get postgres to convert case of non-ascii characters properly?

4
  • What's the collation of the column? Commented Feb 15, 2022 at 21:38
  • There's no column, the query above works as shown on a default install of postgres 13.5 Commented Feb 15, 2022 at 21:40
  • 3
    Then, you'll probably need to enforce a collation since the default collation is not what you need. From the manual "... If the expression is a constant, the collation is the default collation of the data type of the constant..." at postgresql.org/docs/14/collation.html Commented Feb 15, 2022 at 21:41
  • The world doesn't agree on how to sort and change cases, though there are ways to do it which will be more correct more often, so we need collations and locales. Commented Feb 15, 2022 at 21:58

1 Answer 1

4

You are using the wrong collation. For example, with the C collation:

SELECT lower('ALGODÓN' COLLATE "C"), upper('algodón' COLLATE "C");

  lower  │  upper  
═════════╪═════════
 algodÓn │ ALGODóN
(1 row)

But with en_US.utf8 (Linux):

SELECT lower('ALGODÓN' COLLATE "en_US.utf8"), upper('algodón' COLLATE "en_US.utf8");

  lower  │  upper  
═════════╪═════════
 algodón │ ALGODÓN
(1 row)

The language-agnostic ICU collation gets it right too:

SELECT lower('ALGODÓN' COLLATE "und-x-icu"), upper('algodón' COLLATE "und-x-icu");

  lower  │  upper  
═════════╪═════════
 algodón │ ALGODÓN
(1 row)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.