5

Let's assume table with 3 columns (originally it's a big table): id, is_deleted, date. I have to check if given id's are deleted or not and create new column with this value (TRUE or FALSE). Let's simplify it to below table (before):

id is_deleted date
A False 03-07-2022
A True 04-07-2022
B False 05-07-2022
B False 06-07-2022
C True 07-07-2022

(after):

id is_deleted date deleted
A True 03-07-2022 TRUE
A False 04-07-2022 TRUE
B False 05-07-2022 FALSE
B False 06-07-2022 FALSE
C True 07-07-2022 TRUE

So we can see that row with ids A and C should have True value in new column. For given id could be more than one TRUE value in is_deleted column. If any id has at least one TRUE value, all rows with given id should be deleted (TRUE value in new column). I need to do it inside this table, without group by, cuz by choosing group by, I have to create another CTE to join it with and it complicates a problem and performance.

I want to just create single column inside this table with new deleted value.

I've found bool_or function, but it won't work with window functions in redshift, my code:

bool_or(is_deleted) over(partition by id) as is_del

I can't use max, sum functions on boolean. Casting bool to int worsens the performance. Is there any other way to do it using booleans and keep good performance?

Thank you.

6
  • As for the formatting of the table, for some reason Stackoverflow shows it working fine in the preview, but unless you have a blank line before and after the table, it will show up as a garbled mess when you submit. I've edited your question to add that blank line. Hope Stackoverflow fixes that one soon. It's been broken since they introduced table markup. Commented Jan 23, 2023 at 22:59
  • Would both rows of A have a is_del value of True, or just the one row with is_deleted = True? It's not clear to me. Perhaps sharing Desired Results after this operation is complete would help clarify. Commented Jan 23, 2023 at 23:03
  • Yes, both can have TRUE. If there's one ore more TRUE value for given id, it should be deleted. Commented Jan 23, 2023 at 23:07
  • I edited problem, to be more precise. Commented Jan 23, 2023 at 23:12
  • The documentation for the MAX window function states "Accepts any data type as input. Returns the same data type as expression.". See docs.aws.amazon.com/redshift/latest/dg/r_WF_MAX.html Are you saying that the documentation is incorrect? Commented Jan 24, 2023 at 15:38

6 Answers 6

5
+25

It should be possible to emulate such behaviour with MIN/MAX functions and explicit casting:

SELECT MAX(is_deleted::INT) OVER (PARTITION BY id)
FROM ...;
-- if all is_deleted are false, then result is 0, 1 otherwise 

If the result should be boolean, then: MAX(is_deleted::INT) OVER (PARTITION BY id) = 1 or ( MAX(is_deleted::INT) OVER (PARTITION BY id))::BOOLEAN

Sign up to request clarification or add additional context in comments.

2 Comments

Its's how I did it. Is it a good approach to do double casting in such case? Or is there better option, cuz of better performance.
@Joe I would not expect significant performance implications
1

From me here is 2 diffrent way you could check:

1.With EXISTS, which work very well in very redundant table

SELECT
    id
    , is_deleted
    , date
    , NVL((SELECT 'TRUE' FROM dual WHERE EXISTS (SELECT 1 FROM yourtabletable yt2 WHERE 
        yt2.id = yt1.id 
            AND yt2.is_deleted = 'True')
    ), 'FALSE') deleted
FROM 
    yourtabletable yt1;

2.With WITH where you could use hint's like /*+ materialize */

WITH tmp AS(
    SELECT /*+ materialize */ id, 'TRUE' deleted FROM yourtabletable WHERE is_deleted = 'True'
)

SELECT
    id
    , is_deleted
    , date
    , NVL((SELECT deleted FROM tmp yt2 WHERE 
        yt2.id = yt1.id 
            AND yt2.is_deleted = 'True'
    ), 'FALSE') deleted
FROM 
    yourtabletable yt1;

Comments

0

If I understand the problem, then I would think that for each unique id value you should be looking at the is_deleted value that has the latest (maximum) date value. In this way even though there may be a row where is_deleted is true, if there is another row for the same id value with a later date that has is_deleted as false, then false should be the final status. If this isn't how the new deleted column should be computed, then just ignore this answer, please.

Schema (PostgreSQL v15)

CREATE TABLE Table1
    ("id" varchar(1), "is_deleted" bool, "date" timestamp)
;
    
INSERT INTO Table1
    ("id", "is_deleted", "date")
VALUES
    ('A', False, '2022-03-07 00:00:00'),
    ('A', True, '2022-04-07 00:00:00'),
    ('A', True, '2022-04-09 00:00:00'), /* another True row for A */
    ('B', False, '2022-05-07 00:00:00'),
    ('B', False, '2022-06-07 00:00:00'),
    ('C', True, '2022-07-07 00:00:00')
;

Query #1

with lastest_is_deleted as (
    select t.* from
        (select t.id, t.is_deleted as deleted, row_number() over (partition by id order by date desc) as seqnum
            from Table1 t
         ) t
    where seqnum = 1
)

select t.*, l.deleted from
Table1 t join lastest_is_deleted l on t.id = l.id;
id is_deleted date deleted
A false 2022-03-07T00:00:00.000Z true
A true 2022-04-07T00:00:00.000Z true
A true 2022-04-09T00:00:00.000Z true
B false 2022-05-07T00:00:00.000Z false
B false 2022-06-07T00:00:00.000Z false
C true 2022-07-07T00:00:00.000Z true

View on DB Fiddle

Comments

0

This is one of the approach with which you can get all records with their respective deleted column values.

   select a.*,case when  b.id is not null then 'TRUE' else 'FALSE' end as deleted 
from table1 a  left join  (select distinct id from table1  where is_deleted is true) b  on (a.id=b.id) order by 1,3;

I have created sample schema here :https://www.db-fiddle.com/f/4k32Eb1t2DSUQ6FkzKBMXi/0 Feel free to customize it with your data.

CREATE TABLE Table1
("id" varchar(1), "is_deleted" bool, "date" timestamp);

INSERT INTO Table1
    ("id", "is_deleted", "date")
VALUES
    ('A', False, '2022-03-07 00:00:00'),
    ('A', True, '2022-04-07 00:00:00'),
    ('A', True, '2022-04-09 00:00:00'), /* another True row for A */
    ('B', False, '2022-05-07 00:00:00'),
    ('B', False, '2022-06-07 00:00:00'),
    ('C', True, '2022-07-07 00:00:00')
;
INSERT INTO Table1
    ("id", "is_deleted", "date")
VALUES
    ('D', False, '2022-03-07 00:00:00'),
    ('D', false, '2022-04-06 00:00:00');
    
INSERT INTO Table1
    ("id", "is_deleted", "date")
VALUES
    ('C', False, '2022-03-07 00:00:00');

Comments

0

In your case, I think using UNION ALL of 2 sub queries could yield better performance than using window functions, especially if your table have index on id and is_deleted columns.

SELECT 
  d1.*,
  TRUE AS deleted
FROM <your table> d1
WHERE EXISTS (SELECT 1 
              FROM <your table> d2
              WHERE d1.id = d2.id AND is_deleted)
UNION ALL 
SELECT 
  d1.*,
  FALSE AS deleted
FROM <your table> d1
WHERE NOT EXISTS (SELECT 1 
              FROM <your table> d2
              WHERE d1.id = d2.id AND is_deleted); 

See demo here

Comments

-1

This select statement should give the needed output:

select
   yt1.id,  
   yt1.is_deleted,
   yt1.date,
   case when yt2.is_deleted then true else false end as deleted
from yourtabletable yt1
left join yourtabletable yt2 on yt2.id = yt1.id and yt2.is_deleted 

7 Comments

In addition to a missing comma and ambiguous column names, if there were, for example, anther row with values ('A', False, '2022-03-09 00:00:00'), /* another False row for A */, then you would be returning duplicate rows. `
oops I corrected the ambiguous names, and added the (missing) comma.
See this demo of the third issue, which might be a possibility though the data the OP shows is not clear on that issue. But I wouldn't make any assumptions.
Adding DISTINCT solves that. But I (choose) not to add that to my statement in the answer (because it's unknown if that can happen in the questioned problem)
So I posted a question to the OP asking whether it is possible to have such a row, which is better than hiding one's head in the sand.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.