7

I am trying to get the max value of column A ("original_list_price") over windows defined by 2 columns (namely - a unique identifier, called "address_token", and a date field, called "list_date"). I.e. I would like to know the max "original_list_price" of rows with both the same address_token AND list_date.

E.g.:

SELECT 
address_token, list_date, original_list_price, 
max(original_list_price) OVER (PARTITION BY address_token, list_date) as max_list_price
FROM table1  

The query already takes >10 minutes when I use just 1 expression in the PARTITION (e.g. using address_token only, nothing after that). Sometimes the query times out. (I use Mode Analytics and get this error: An I/O error occurred while sending to the backend) So my questions are:

1) Will the Window function with multiple PARTITION BY expressions work?

2) Any other way to achieve my desired result?

3) Any way to make Windows functions, especially the Partition part run faster? e.g. use certain data types over others, try to avoid long alphanumeric string identifiers?

Thank you!

1
  • I wonder if you're looking for grouping sets / with-rollup Commented Nov 14, 2016 at 8:13

1 Answer 1

5

The complexity of the window functions partitioning clause should not have a big impact on performance. Do realize that your query is returning all the rows in the table, so there might be a very large result set.

Window functions should be able to take advantage of indexes. For this query:

SELECT address_token, list_date, original_list_price, 
       max(original_list_price) OVER (PARTITION BY address_token, list_date) as max_list_price
FROM table1;

You want an index on table1(address_token, list_date, original_list_price).

You could try writing the query as:

select t1.*,
       (select max(t2.original_list_price)
        from table1 t2
        where t2.address_token = t1.address_token and t2.list_date = t1.list_date
       ) as max_list_price
from table1 t1;

This should return results more quickly, because it doesn't have to calculate the window function value first (for all rows) before returning values.

Sign up to request clarification or add additional context in comments.

2 Comments

Taking inspiration from your subquery, I'm now trying to write a subquery which orders the data in a way (order by original_list_price desc) such that the top value of each group (grouping by address_token and list_date) would be the desired row (i.e. the max original_list_price for that combination of address_token and list_date). As follow-up question: I remember there's a way to take only the top value of a group, in a separate subquery? Select distinct maybe?
@LauraD . . . New questions should be asked as questions not comments. Ask another question if you are looking for the top value of a group, because that is not what this question asks (and changing this question is rude because it would likely invalidate this answer, which would draw downvotes).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.