31

In the database, I have various alpha-numeric strings in the following format:

10_asdaasda
100_inkskabsjd
11_kancaascjas
45_aksndsialcn
22_dsdaskjca
100_skdnascbka

I want them to essentially be sorted by the number in front of the string and then the string name itself, but of course, characters are compared one by one and so the result of Order by name produces:

10_asdaasda
100_inkskabsjd
100_skdnascbka
11_kancaascjas
22_dsdaskjca
45_aksndsialcn

instead of the order I'd prefer:

10_asdaasda
11_kancaascjas
22_dsdaskjca
45_aksndsialcn
100_inkskabsjd
100_skdnascbka

Honestly, I would be fine if the strings were just sorted by the number in front. I'm not too familiar with PostgreSQL, so I wasn't sure what the best way to do this would be. I'd appreciate any help!

1
  • 1
    Unfortunately, PostgreSQL doesn't offer automatic number collation, nor date collation ("Jan" < "Feb < "Apr" etc). It's really complicated to do and would have to be set per-column, per-query or per-sort even as it often isn't desired and would be expensive to do unncessarily. AFAIK nobody's implemented it yet. It'd be possible with a custom text data type variant like citext does for case insensitivity, it's just a matter of the (lots of) coding getting done by somebody who cares. Commented Jul 11, 2012 at 0:42

4 Answers 4

46

The ideal way would be to normalize your design and split the two components of the column into two separate columns. One of type integer, one text.

With the current table, you could:

SELECT col
FROM   tbl
ORDER  BY (substring(col, '^[0-9]+'))::int  -- cast to integer
         , substring(col, '[^0-9_].*$');    -- works as text

The same substring() expressions can be used to split the column.

These regular expressions are somewhat fault tolerant:
The first regex picks the longest numeric string from the left, NULL if no digits are found, so the cast to integer can't go wrong.
The second regex picks the rest of the string from the first character that is not a digit or '_'.

If the underscore (_) is an unambiguous separator, split_part() is faster:

SELECT col
FROM   tbl
ORDER  BY split_part(col, '_', 1)::int
        , split_part(col, '_', 2);

db<>fiddle here

See:

Sign up to request clarification or add additional context in comments.

5 Comments

So, if I had a query like...SELECT name FROM nametable, how exactly would I put it into that? If the names were each of the strings. Would it be something like WITH x(t) AS name, and etc?
@user1464055: The CTE is just for demonstration here on the site and easy testing. If x was a table and t the column (and without the CTE), it would work the same. I added the exact syntax and some more details.
Thanks a lot, that was a great answer to my question. I wanted to do something very similar, but I just couldn't figure out the syntax for the life of me. Thanks again!
@user1464055: Regular expressions are powerful but tricky. I added another tiny improvement.
Yeah, I was trying to pretty much cast everything, and obviously that was failing. I understand both of the expressions now, thanks for giving me a perfect workaround to adding another column! Haha, honestly, I didn't make the database, and adding a column may have been a challenge.
9

You can use regular expressions with substrings

   order by substring(column, '^[0-9]+')::int, substring(column, '[^0-9]*$')

Comments

4

There is a way to do it with an index over an expression. It wouldn't be my preferred solution (I would go for Brad's) but you can create an index on the following expression (there are more ways to do it):

CREATE INDEX idx_name ON table (CAST(SPLIT_PART(columname, '_', 1) AS integer));  

Then you can search and order by CAST(SPLIT_PART(columname, '_', 1) AS integer) every time you need the number before the underline character, such as:

SELECT * FROM table ORDER BY CAST(SPLIT_PART(columname, '_', 1) AS integer);  

You can do the same to the string part by creating an index on SPLIT_PART(columname, '_', 2), and then sort accordingly as well.
As I said, however, I find this solution very ugly. I would definitely go with two other columns (one for the number and one for the string), then maybe even removing the column you mention here.

1 Comment

+1 An index on the expression is a good idea to improve sort performance. Faster yet: a multi-column index on both expressions - in matching order. (Reposted comment to fix link format).
1

You should add a new column to the database which is has numeric data type and on persisting a new record set it to the same value as the prefix on the string value you have.

Then you can create an index on the properly typed numeric column for sorting.

1 Comment

Yeah, that's honestly what I was thinking. Otherwise, it's pretty challenging to figure out how to sort these. I was just hoping for another way since there's already like 600 elements in this format :/

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.