4

I need to sort a query's results by two methods at the same time. I want the first 3 records (returned) to be based on their prevalence in another table And then I want the rest of the results sorted alphabetically.

Assuming I have 6 records in a query result set....

empl_Type count
A 10
B 5
C 2
D 1
E 1
F 20

then results should be F,A,B,C,D,E

  • F,A,B sorted by their count in another table
  • C,D,E (remainder of rows) sorted alphabetically

I have the first part working in the following contrived example:

SELECT 
    et.id,
    et.employeetype_description,
    count(e.employeetypeid) as thesortorder
FROM 
    employeetype et
LEFT JOIN
    employee e ON e.employeetypeid = et.id
GROUP BY
    et.id,
    et.employeetype_description
ORDER BY thesortorder DESC

And this is where (hopefully you come in...) How do I meet the rest of the requirements? Thanks

3
  • That's unfortunate, but since your to-be-alphabetically-ordered rows (C, D, E) are also ordered by count (2, 1, 1), in fact your query incidentally returns the expected results for your example data. Maybe give C, D, E respective counts of 1, 2, 3 to better expose the problem? Commented Nov 3 at 8:25
  • What to do in a case of ties, e.g. 20, 10, 5, 5, 2, 1? Show four top rows instead of three? Or pick one of the five-count rows arbitrarily? Or by some rule? Or only show two top rows instead of three? Commented Nov 3 at 9:07
  • ORDER BY least(4,row_number()over(order by count(*) desc)), id - in this method, the additional tie-breaking @ThorstenKettner's comment asks to specify, simply goes into the window spec: either as another expression after the count(*), or as a rank() or dense_rank() replacing row_number(). Commented Nov 3 at 14:29

4 Answers 4

5

Window functions are allowed in order by, even if you're aggregating. demo at db<>fiddle

ORDER BY/*1*/least(4,row_number()over(order by count(e.employeetypeid)desc)),
        /*2*/id
id employeetype_description thesortorder
F F_description 20
A A_description 10
B B_description 5
C C_description 1
D D_description 3
E E_description 2

Quoting the doc:

Window functions are permitted only in the SELECT list and the ORDER BY clause of the query. They are forbidden elsewhere, such as in GROUP BY, HAVING and WHERE clauses. This is because they logically execute after the processing of those clauses. Also, window functions execute after non-window aggregate functions. This means it is valid to include an aggregate function call in the arguments of a window function, but not vice versa.


You can clean things up a bit with a named window clause (also in MySQL, SQLite, Trino, SQL Server 2022 (16+):

SELECT et.id,
       et.employeetype_description,
       count(e.employeetypeid) as thesortorder
FROM employeetype et
LEFT JOIN employee e ON e.employeetypeid = et.id
GROUP BY et.id,
         et.employeetype_description
WINDOW w1 AS (order by count(e.employeetypeid) desc)
ORDER BY /*1*/ least((row_number() over w1), 4),
         /*2*/ id

In this method, the additional tie-breaking @ThorstenKettner's comment asks to specify, simply goes into the window spec:

  • To use count(*) for placing more than 3 rows on top in case their counts draw:
ORDER BY least(4,dense_rank()over(order by count(e.employeetypeid)desc)),id
  • To use alphabetical sorting to break ties on top 3 positions:
ORDER BY least(4,row_number()over(order by count(e.employeetypeid)desc, id)),id
Sign up to request clarification or add additional context in comments.

7 Comments

Put this least idea directly in the ORDER BY clause is clever!
Thanks. The description links the whole conditional expressions doc to underline it's just one of a few ways, slightly more compact but not really different from writing out a lengthy case expression. You can also flip it and use greatest(), among other things.
Is the first argument to least (the 4) a postrgresql thing only?
I don't think so. You could argue there's technically only 1 argument, because it's variadic, taking a variable-length array of arguments, coercing them to a common type before picking the smallest one, as per that type's definition of <. Works pretty much the same in MySQL, Oracle and MsSQL(22+). In the SQL standard, it's an optional feature as of SQL:2023 or ISO/IEC 9075:2023
To clarify: that 4 has no special meaning here (just "make this return at most four", because I only want this expression to handle the top 3 spots in the ranking) and it doesn't matter whether it's passed as the first or n-th argument. Here, least() simply compares it to the result of (1-based) row_number(), and any other argument you throw in, in no particular order. You could also do case when row_number()over(..)>4 then 4 else row_number()over(..) end but that's already pretty long and gets even worse with more elements to compare.
Oh, I'm just overthinking this, you're just adding 4 to the set of values for the least function. I learned 2 things from this thread!
I think I understand how you might have been reading this: least four, as in the four smallest elements in whatever set this is, as opposed to the least element out of these two, the first one being four. I think it's actually useful polysemy, since both concepts would work - whichever way you understood it, you were right, details aside. Same with the opening even if you're aggregating that could refer either to the whole statement being an aggregate query where window and non-window aggregates coexist, or to using an aggregate function call inside the window definition itself.
1

Calculate the ordering with a CASE on analytics in a subquery:

select id, employeetype_description, cnt
from (
    select id, employeetype_description, cnt,
        case when row_number() over(order by cnt desc) <= 3
            then row_number() over(order by cnt desc)
            else row_number() over(order by employeetype_description asc) + 3
        end as rn
    from (
        SELECT 
            et.id,
            et.employeetype_description,
            count(e.employeetypeid) as cnt
        FROM 
            employeetype et
        LEFT JOIN
            employee e ON e.employeetypeid = et.id
        GROUP BY
            et.id,
            et.employeetype_description
        )
)
order by rn
;

(typed in the browser, may contain typos, since no DDL provided...)

1 Comment

Or, move the CASE expression to the order by... ORDER BY CASE WHEN rn <= 3 THEN rn ELSE 4 END, employeetype_description
1

Subqueries in the ORDER BY

Generally speaking, you can use complex expression, and notably subqueries in the ORDER BY.

Here it would allow you to condition your first ORDER BY term to the id being one of your 3 most populated employee types:

SELECT 
    et.id,
    et.employeetype_description,
    count(e.employeetypeid) as thesortorder
FROM 
    employeetype et
LEFT JOIN
    employee e ON e.employeetypeid = et.id
GROUP BY
    et.id,
    et.employeetype_description
ORDER BY
    CASE WHEN et.id IN
    (
        SELECT employeetypeid
        FROM employee
        GROUP BY 1
        ORDER BY COUNT(*) DESC
        LIMIT 3
    )
    THEN count(e.employeetypeid) END
    DESC NULLS LAST,
    employeetype_description
;

Or use a correlated subquery:

WITH top3 AS
(
    SELECT employeetypeid AS id, COUNT(*) count
    FROM employee
    GROUP BY 1
    ORDER BY 2 DESC LIMIT 3
)
SELECT 
    et.id,
    et.employeetype_description,
    count(e.employeetypeid) as thesortorder
FROM 
    employeetype et
LEFT JOIN
    employee e ON e.employeetypeid = et.id
GROUP BY
    et.id,
    et.employeetype_description
ORDER BY
    (SELECT count FROM top3 WHERE top3.id = et.id) DESC NULLS LAST,
    employeetype_description
;

But as you notice, we obtain something quite complex and, moreover, repetitive (COUNT() is computed twice).
And as you now, in any programming language as well as in data models, duplication is risky…

row_number() window function

… So in your specific case where thesortorder is used not only for the ORDER BY but also to be a result column,
I would first compute the COUNT(*) for every row of the desired resultset in a Common Table Expression
(with "Common" meaning we want to reuse it: once for its data, once as a sort criteria, we're good to go),
with an additional column telling where each row is placed compared to the others, thanks to the row_number() window function.
Then condition the use of the "count" sort criteria, to this position being computed as one of the first 3 rows of this CTE.

WITH e AS
(
    SELECT
        employeetypeid,
        COUNT(*) count,
        ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC) pos
    FROM employee
    GROUP BY 1
)
SELECT 
    et.id,
    et.employeetype_description,
    e.count as thesortorder
FROM 
    employeetype et
LEFT JOIN
    e ON e.employeetypeid = et.id
ORDER BY
    CASE WHEN e.pos <= 3 THEN e.count END DESC NULLS LAST,
    employeetype_description
;

The 3 methods are shown, in that order, in a db<>fiddle (where I gave different counts to C, D and E, to avoid a false positive due to incidentally already sorted data):

id employeetype_description thesortorder
F F 20
A A 10
B B 5
C C 1
D D 3
E E 2

Final note: the richness of window functions offers a lot of flexibility, take time to read their list to get a glimpse;
for example, you could use rank() instead of row_number() to address the problem of ties that Thorsten Kettner pointed.
Or use percent_rank().
Even look for the "best placed cut" by finding the biggest gap between 2 successive counts, so that if you had 20, 10, 9, 8, 2, 1, 1 you took the 4th (8) with the first three.

Comments

0

Use a CTE to calculate the result set and use it twice:

WITH q AS (
   SELECT empl_type, count,
   lag(count, 3) OVER (ORDER BY count) IS NULL AS leading_three
   FROM ...
)
(SELECT empl_type FROM q WHERE leading_three ORDER BY count DESC)
UNION ALL
(SELECT empl_type FROM q WHERE NOT leading_three ORDER BY empl_type);

That solution depends on PostgreSQL's implementation of UNION ALL, which just appends the two result sets. There is nothing in SQL that forces the database to produce the result of UNION ALL in a specific order. So only use my solution if you don't mind depending on an implementation detail that might change in the future, even if that is unlikely.

For a solution that does not depend on the implementation of PostgreSQL, run two separate queries and append the result sets on the client.

1 Comment

Upvoting to balance out someone's quiet downvote I don't agree with. Even if the reliance on an implementation detail doesn't sit right with someone, the post is helpful and informative. The fact that the set has to be searched twice to split it in two prior to merging doesn't help, but feels justified to create the context to introduce the union all trick.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.