I have a table p that looks like this:
| ID | Col1 |
|---|---|
| AAA | kddd |
| AAA | 13bd |
| AAA | 14cd |
| AAA | 15cd |
| BBB | 15cd |
| BBB | 23fd |
| BBB | 4rre |
| BBB | tr3e |
| CCC | kddd |
| CCC | 12ed |
| DDD | rrr4 |
| DDD | rtt4 |
| DDD | rrt4 |
I have three lists of patterns that classify each group based on the values matching in Col1.
- If the codes are like ('_ddd', '_ccc', '_bbb', '_aaa') then return 'b'
- If the codes are like ('_3c_', '_3b_', '_3a_') then return 'S'
- If the codes are like ('_5c_', '_5b_', '_5a_') then return 'U'
- If none of the codes match then return 'U'
The patterns are much longer so I made temporary tables to store and call them
CREATE OR REPLACE TEMPORARY TABLE b_codes (value VARCHAR(4));
INSERT INTO b_codes (value) VALUES ('_ddd'), ('_ccc'), ('_bbb'), ('_aaa');
I did the same for s_codes and u_codes.
From the codes, if an ID contains none of the codes then mark 'U'.
If an ID has any u_codes then mark 'U' if no s_codes or b_codes are present. If an ID has any b_codes, then mark as 'b'. If there are u_codes and s_codes mark 'S'.
The resulting table should look like
| ID | Col1 |
|---|---|
| AAA | S |
| BBB | U |
| CCC | b |
| DDD | U |
My attempt
SELECT ID, MAX(t.Flag) AS Flag
FROM (
SELECT
ID,
CASE
WHEN (p.Col1 LIKE ANY (SELECT value FROM u_codes) AND
NOT (
p.Col1 LIKE ANY (SELECT value FROM s_codes) OR
p.Col1 LIKE ANY (SELECT value FROM b_codes)
) THEN 'U'
WHEN (p.Col1 LIKE ANY (SELECT value FROM s_codes) AND
NOT (
p.Col1 LIKE ANY (SELECT value FROM u_codes) OR
p.Col1 LIKE ANY (SELECT value FROM b_codes)
) THEN 'S'
WHEN (p.Col1 LIKE ANY (SELECT value FROM b_codes) THEN 'b'
WHEN (
NOT p.Col1 LIKE ANY (SELECT value FROM u_codes) AND
NOT p.Col1 LIKE ANY (SELECT value FROM s_codes) AND
NOT p.Col1 LIKE ANY (SELECT value FROM b_codes)
) THEN NULL
ELSE NULL
END AS Flag
) AS t
GROUP BY ID;
The sub-query should return
| ID | Col1 | Flag |
|---|---|---|
| AAA | kddd | b |
| AAA | 13bd | S |
| AAA | 14cd | NULL |
| AAA | 15cd | U |
| BBB | 15cd | U |
| BBB | 23fd | NULL |
| BBB | 4rre | NULL |
| BBB | tr3e | NULL |
| CCC | kddd | b |
| CCC | 12ed | NULL |
| DDD | rrr4 | NULL |
| DDD | rtt4 | NULL |
| DDD | rrt4 | NULL |
I tried using Snowflake's lexicographical ordering in the MAX function, but I don't think that works. What would be a better way to get the correct labels in the MAX function?
(AAA,kddd)should match thebcode, and so the result forAAAshould bebrather thanS.