3

My Postgres server gives me very confusing result while I try to compare strings which contains slash '/' or question mark '?'. For example in psql I ran:

select ('/' < '1') as c1,
       ('/1' < '1') as c2,
       ('/////1' < '1') as c3,
       ('/1' < '2') as c4,
       ('/1' < '11') as c5;

The result was:

 c1 | c2 | c3 | c4 | c5
----+----+----+----+----
 t  | f  | f  | t  | t

So '/' is smaller than '1', but '/1' is larger than '1'. In fact '/1' is between '1' and '2' and so is '/////1'. This did not follow lexicographical order.

However, '/1' was (correctly) smaller than 11 which made me even more confused.

I wanted to see if '/1' is considered escaped. So I ran:

select length('/1');

and I got 2, which meant that postgres does treat '/1' as a two character string.

The same problem happens when I replace / with other symbols such as $ or ?.

If you have docker, this problem is easily reproducible by runing a postgres in a docker container:

docker run postgres:11
docker exec  -it `docker ps | grep postgres:11 | cut -d' ' -f 1` psql -U postgres

Then try the above SQLs. I tried postgres 10 image and the behavior is the same.

The same thing happens to a real SQL when I compare a VARCHAR column with a string literal. This problem is driving me crazy since I need to write the correct SQL to compare file paths, which apparently contain many '/' symbols.

I searched and did not find any document talking about this so this doesn't look like a postgres 'official feature'. What is the right way to write a comparison that follows the lexicographical order?

Thanks a lot in advance.

0

2 Answers 2

2

Postgres uses the operating system's collation (on Linux that would be the ones provided by glibc). So your results depend on the underlying operating system.

You can force an ASCCI comparison by using the "C" collation (as I did in the above examples):

select '/1' > '1' collate "C"

which seems to work the same on all platforms. Alternatively you can specify an ICU collation that will also work the same across all platforms.


You mentioned you want to compare file paths. One way to do that only on the "names" (ignoring the delimiter") is to convert the path to an array string_to_array(filepath, '/') and then e.g. use that array for sorting or comparison.

Sign up to request clarification or add additional context in comments.

1 Comment

Oh! You are a life-saver! Thank you!
0

I have encountered the same sort of issue

select ('|' < 'x') as c1,
   ('|x' < 'x') as c2,
   ('|1' < 'x1') as c3,
   ('||' < 'x|') as c4,
   ('||1' < 'x|1') as c5;
c1 c2 c3 c4 c5
true true true true true

Replace 'x' with '0' on the right-hand side

select ('|' < '0') as c1,
   ('|x' < '0') as c2,
   ('|1' < '01') as c3,
   ('||' < '0|') as c4,
   ('||1' < '0|1') as c5;
c1 c2 c3 c4 c5
true false false true false

This is on PostgreSQL on Azure PostgreSQL 16.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.2.0, 64-bit

On Windows, it is all fine.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.