0

I'm working with a query that looks like this:

SELECT
  CASE 
    WHEN num BETWEEN 1 AND 5 THEN '1-5'
    WHEN num BETWEEN 6 AND 10 THEN '6-10'
    WHEN num BETWEEN 11 AND 20 THEN '11-20'
    WHEN num > 20 THEN '20+'
END AS bucket,
COUNT(1)
FROM data
GROUP BY 1

...

I'd like to order the results by the different values of bucket so the results look something like this

bucket count(1)
1-5 10
6-10 11
11-20 17
20+ 8

Since bucket is a string, simply sorting by that column doesn't do the trick.

Are there any string operations in SQL that can help me? For reasons I won't get into the CASE statements are the easiest way for me to get results in the format I need, so I'd rather deal with a bunch of string operations to get it in the order I want than change the structure of the query

4
  • Welcome to SO. You could go with 01-05, 06-10, 11-20, 20+ as your strings. Commented Jun 13, 2022 at 23:46
  • GROUP BY 1 <-- You should change this to use the explicit column name - also COUNT(1) is not idiomatic, is there a reason you're not using COUNT(*)? Commented Jun 13, 2022 at 23:47
  • Is num an integer or non-integer value? If it's non-integer, what bucket should 5.5 go into? Commented Jun 13, 2022 at 23:50
  • Yeah, If this is going to be a production query or a query you intend to keep using for a while, it is better to make the grouped on column explicit by name - it makes the intention of the query more clear and protects it from unitentional bugs if there are future changes in the database table structure. Indeed - it is a little hard to be sure what this query means with that unnamed field in there just as (1) - is it the buckets, or another field? I am not really sure! Commented Jun 13, 2022 at 23:53

2 Answers 2

1

My approach uses integer lower-bounds for the buckets (as integers are nicer to deal with than strings), and only generates the string bucket names at the very end, after grouping.

WITH histogramData AS (
    SELECT
        CASE
            WHEN num <  1 THEN NULL
            WHEN num <  6 THEN 1
            WHEN num < 11 THEN 6
            WHEN num < 21 THEN 11 ELSE 21
        END AS Bucket,
        data.*
    FROM
        data
),
grouped AS (
    SELECT
        h.Bucket,
        COUNT(*) AS [Count]
    FROM
        histogramData AS h
    GROUP BY
        h.Bucket
)
SELECT
    CASE ISNULL( g.Bucket, 0 )
        WHEN  0 THEN NULL,
        WHEN  1 THEN '1-5'
        WHEN  6 THEN '6-10'
        WHEN 11 THEN '11-20' ELSE '21+'
    END AS [Bucket Name],
    g.[Count]
FROM
    grouped
ORDER BY
    g.Bucket 
Sign up to request clarification or add additional context in comments.

2 Comments

Is this solution faster than your first proposed solution? (despite the two ctes) @Dai
@lemon The only way to find out is by running it and looking at the query execution plan.
0
SELECT
  CASE 
    WHEN num BETWEEN 1 AND 5 THEN '01-05'
    WHEN num BETWEEN 6 AND 10 THEN '06-10'
    WHEN num BETWEEN 11 AND 20 THEN '11-20'
    WHEN num > 20 THEN '20+'
END AS bucket,
COUNT(1)
FROM data
GROUP BY 1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.