Grouping by and making flag based on patterns

Question

I have a table p that looks like this:

ID	Col1
AAA	kddd
AAA	13bd
AAA	14cd
AAA	15cd
BBB	15cd
BBB	23fd
BBB	4rre
BBB	tr3e
CCC	kddd
CCC	12ed
DDD	rrr4
DDD	rtt4
DDD	rrt4

I have three lists of patterns that classify each group based on the values matching in Col1.

If the codes are like ('_ddd', '_ccc', '_bbb', '_aaa') then return 'b'
If the codes are like ('_3c_', '_3b_', '_3a_') then return 'S'
If the codes are like ('_5c_', '_5b_', '_5a_') then return 'U'
If none of the codes match then return 'U'

The patterns are much longer so I made temporary tables to store and call them

CREATE OR REPLACE TEMPORARY TABLE b_codes (value VARCHAR(4));
INSERT INTO b_codes (value) VALUES ('_ddd'), ('_ccc'), ('_bbb'), ('_aaa');

I did the same for s_codes and u_codes.

From the codes, if an ID contains none of the codes then mark 'U'. If an ID has any u_codes then mark 'U' if no s_codes or b_codes are present. If an ID has any b_codes, then mark as 'b'. If there are u_codes and s_codes mark 'S'.

The resulting table should look like

ID	Col1
AAA	S
BBB	U
CCC	b
DDD	U

My attempt

SELECT ID, MAX(t.Flag) AS Flag
FROM (
   SELECT 
     ID,
     CASE
       WHEN (p.Col1 LIKE ANY (SELECT value FROM u_codes) AND
         NOT (
              p.Col1 LIKE ANY (SELECT value FROM s_codes) OR
              p.Col1 LIKE ANY (SELECT value FROM b_codes)
         ) THEN 'U'
       WHEN (p.Col1 LIKE ANY (SELECT value FROM s_codes) AND
         NOT (
              p.Col1 LIKE ANY (SELECT value FROM u_codes) OR
              p.Col1 LIKE ANY (SELECT value FROM b_codes)
         ) THEN 'S'
       WHEN (p.Col1 LIKE ANY (SELECT value FROM b_codes) THEN 'b'
       WHEN (
         NOT p.Col1 LIKE ANY (SELECT value FROM u_codes) AND
         NOT p.Col1 LIKE ANY (SELECT value FROM s_codes) AND
         NOT p.Col1 LIKE ANY (SELECT value FROM b_codes)
         ) THEN NULL
       ELSE NULL
     END AS Flag

) AS t
GROUP BY ID;

The sub-query should return

ID	Col1	Flag
AAA	kddd	b
AAA	13bd	S
AAA	14cd	NULL
AAA	15cd	U
BBB	15cd	U
BBB	23fd	NULL
BBB	4rre	NULL
BBB	tr3e	NULL
CCC	kddd	b
CCC	12ed	NULL
DDD	rrr4	NULL
DDD	rtt4	NULL
DDD	rrt4	NULL

I tried using Snowflake's lexicographical ordering in the MAX function, but I don't think that works. What would be a better way to get the correct labels in the MAX function?

It looks to me like the very first sample row (AAA,kddd) should match the b code, and so the result for AAA should be b rather than S. — Joel Coehoorn
– Joel Coehoorn, Commented Oct 31 at 16:31

Joel Coehoorn · Accepted Answer · 2025-10-31 21:24:08Z

What would be a better way to get the correct labels in the MAX function?

The problem with the MAX() function (and similar) for this kind of query is you often need to use the max value from one column to show the corresponding value from another column, or you need to define some other criteria for what you mean by "MAX". This is possible with normal aggregation, but tends to be complex to write and maintain and slow to execute.

Instead, this is where analytic Window Functions shine. You use the window function function to define rankings that apply to entire rows within a partition, and then filter for only the row(s) where we find the desired rank. Then you can take values from whatever columns in those rows you need.

With this technique, Snowflake can find the desired results from this original problem with no nesting/subqueries^* and only ONE mapping table, which can also be expressed concisely in the query and does not need to be a temp table:

SELECT p.ID, coalesce(m.code,'U') code 
FROM p
LEFT JOIN (
    VALUES
        (1, 'b', '_ddd'),
        (1, 'b', '_ccc'),
        (1, 'b', '_bbb'),
        (2, 'S', '_3c_'),
        (2, 'S', '_3b_'),
        (2, 'S', '_31_'),
        -- U codes not needed, since it's the default; included only for completeness
        (3, 'U', '_5c_'),
        (3, 'U', '_5b_'),
        (3, 'U', '_5a_')
    ) m(precedence, code, expr) ON p.Col1 LIKE m.expr
QUALIFY row_number() over (partition by p.ID order by m.precedence) = 1

More than 9 times in 10, if you have a temp table you should have something like a subquery, table-value constructor, or common table expression instead.

"The patterns are much longer ... "

This might justify a temporary table vs the table-value constructor, but if you can build the INSERT sql you can build this just as easily.

Even if you do continue to use a temp table I would still use this structure with the single mapping table. At most I might normalize it to two tables so each code/precedence pair has one row in a parent table, and then join to a child table for just code+expr columns. But that complexity is probably not worth it here.

If the data is really that detailed, I'd also look to make this permanent, so the data can also be maintained outside of this query and perhaps even indexed (but Snowflake will probably do just fine w/o the index/cluster).

Note when we do pull this data to it's own location, the query reduces to just four lines:

SELECT p.ID, coalesce(m.code,'U') code 
FROM p
LEFT JOIN code_map m ON p.Col1 LIKE m.expr
QUALIFY row_number() over (partition by p.ID order by m.precedence) = 1

"The resulting table should look like ... "

I think you have an error here. The very first row in the sample data — (AAA,kddd) — should match the b code. However, the expected results for the AAA ID shows an S, which has lower precedence. (This is another reason not to normalize the mapping table; repeating the precedence in each row allows for more complex rules where sometimes a "lesser" code might still win).

Formal documentation for some of the less-common features used here:

_{* Not counting the table-value constructor, which has neither SELECT nor FROM}

Alparslan ŞEN · Accepted Answer · 2025-10-31 15:23:10Z

2

SELECT 
    p.id,
    CASE
        WHEN EXISTS (SELECT 1 FROM b_codes WHERE p.Col1 LIKE value) THEN 'b'
        WHEN EXISTS (SELECT 1 FROM s_codes WHERE p.Col1 LIKE value) THEN 'S'
        WHEN EXISTS (SELECT 1 FROM u_codes WHERE p.Col1 LIKE value) THEN 'U'
        ELSE 'U'
    END AS Flag
FROM p;

output:

ID	Flag
AAA	b
AAA	S
AAA	U
AAA	U
BBB	U
BBB	U
BBB	U
BBB	U
CCC	b
CCC	U
DDD	U
DDD	U
DDD	U

answered Oct 31 at 15:23

Alparslan ŞEN

7021 gold badge7 silver badges27 bronze badges

Comments

Tim Biegeleisen · Accepted Answer · 2025-10-31 15:14:47Z

0

Matching 3 of the same letter in Snowflake is surprisingly not straightforward. One option would be to use REGEXP_LIKE with an alternation:

SELECT
    ID,
    CASE WHEN REGEXP_LIKE(Col1, 'AAA|BBB|CCC|DDD|EEE|FFF|GGG|HHH|III|JJJ|KKK|LLL|MMM|NNN|OOO|PPP|QQQ|RRR|SSS|TTT|UUU|VVV|WWW|XXX|YYY|ZZZ', 'i')
         THEN 'b'
         WHEN REGEXP_LIKE(Col1, '.3[A-Z].', 'i') THEN 'S'
         ELSE 'U' END AS Flag
FROM yourTable;

answered Oct 31 at 15:14

Tim Biegeleisen

526k32 gold badges323 silver badges399 bronze badges

3 Comments

m13op22 Oct 31 at 15:23

If I'm reading this right, I don't think the first when statement matches the patterns for b_codes, it's searching Col1 for ID values. I don't understand why to do that.

m13op22 Oct 31 at 15:28

Apologies, I realized I missed the END of the CASE WHEN statement. Also added my expected subquery result.

Jonas Metzler Oct 31 at 16:00

What information do you want to tell us? If the answer solves your task, then accept it. If you want to edit your question, just do it. No need to comment this. Please remove your comments here unless you have a question concerning the answer. In this case, ask.

Lajos Arpad · Accepted Answer · 2025-10-31 17:16:27Z

0

You don't need to create so many tables. Instead, you could create your temporary table in a manner that the replacement will be inside of it:

CREATE OR REPLACE TEMPORARY TABLE codes (value VARCHAR(4), code VARCHAR(1));
INSERT INTO codes (value, code) VALUES
('_ddd', 'b'), ('_ccc', 'b'), ('_bbb', 'b'), ('_aaa', 'b'),
('_3c_', 'S'), ('_3b_', 'S'), ('_3a_', 'S');

And then you can left join:

select
    ID,
    case
        when max(rank) = 2 then 'b'
        when max(rank) = 1 then 'S'
        else 'U'
    end as result
from (
    select 
        p.ID,
        case
            when codes.value = 'b' then 2
            when codes.value = 'S' then 1
        end as rank
    from p
    left join codes
    on p.col1 like codes.value) t
group by ID

In the subquery you get all code matches for col1, ranked as 2 if they were a b and as 1 if they were an S, defaulting to null for other values. Then the outer query loads these records, group them by ID and looks for the greatest rank. If it was a 2, then it will be a b. If it was a 1, then it will be an S. Otherwise it will be a U. I prioritised b over S, but given the values it does not seem to be a problem, as all b values mutually exclusive in the rules you have given.

I also did not apply point 3., as the evaluation result would be an exact match to the absolute fallback, so it is not worth to check for 3 and we can treat 3 and 4 together as the absolute fallback.

EDIT -> further simplification as suggested in the comment-section:

CREATE OR REPLACE TEMPORARY TABLE codes (value VARCHAR(4), code VARCHAR(1));
INSERT INTO codes (value, code) VALUES
('_ddd', 'b'), ('_ccc', 'b'), ('_bbb', 'b'), ('_aaa', 'b'),
('_3c_', 'S'), ('_3b_', 'S'), ('_3a_', 'S');
CREATE OR REPLACE TEMPORARY TABLE ranks(code VARCHAR(1), rank int);
INSERT INTO ranks (code, rank) VALUES
('b', 2),
('S', 1);

We join with the ranks table too:

select
    ID,
    max(rank) as result
from (
    select 
        p.ID,
        ranks.rank
    from p
    left join codes
    on p.col1 like codes.value
    left join ranks
    on codes.code = ranks.code
) t
group by ID

edited Oct 31 at 17:16

answered Oct 31 at 15:45

Lajos Arpad

80.3k42 gold badges122 silver badges234 bronze badges

2 Comments

Joel Coehoorn Oct 31 at 17:10

Put the rank value inside the temp table, too, to simplify this even further.

Lajos Arpad Oct 31 at 17:17

Great idea, thank you! I created another temporary table in order to avoid redundancy, but applied your changes in an edit.

Collectives™ on Stack Overflow

Grouping by and making flag based on patterns

4 Answers 4

Comments

Comments

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related