4,620 questions
2
votes
3
answers
112
views
Calculate difference between two values, including those only appearing once within the partition
DB<>Fiddle
CREATE TABLE inventory (
id SERIAL PRIMARY KEY,
stock_date DATE,
product VARCHAR,
stock_balance INT
);
INSERT INTO inventory
(stock_date, product, stock_balance)VALUES ...
3
votes
3
answers
123
views
How to retrieve a sub-array from result of array_agg?
I have a SQL table in postgres 14 that looks something like this:
f_key
data1
data2
fit
1
{'a1', 'a2'}
null
3
1
{'b1', 'b2'}
{'b3'}
2
2
{'c1', 'c2'}
null
3
Note that data1 and data2 are arrays.
I need ...
1
vote
2
answers
164
views
How to efficiently calculate an exponential moving average in postgres?
I'm trying to calculate the average true range on some time series dataset stored in postgres. Its calculation requires a 14 period exponential moving average of true range which based on the answer ...
4
votes
4
answers
216
views
Sort aggregated query results by two methods simultaneously
I need to sort a query's results by two methods at the same time.
I want the first 3 records (returned) to be based on their prevalence in another table
And then I want the rest of the results sorted ...
-1
votes
0
answers
145
views
How to aggregate a group by query in django?
I'm working with time series data which are represented using this model:
class Price:
timestamp = models.IntegerField()
price = models.FloatField()
Assuming timestamp has 1 min interval data,...
-1
votes
2
answers
191
views
Calculate SUM over a primary key and between dates
My query:
SELECT
c.CustID,
o.OrderID,
SUM(ol.Qty * ol.Price) AS SUMOrder,
AVG(SUM(ol.Qty * ol.Price)) OVER (PARTITION BY c.CustID) AS AVGAllOrders,
COUNT(*) AS Countorders,
SUM(...
-1
votes
1
answer
169
views
Assign unique values in a set-based approach
Simplifying, I have the following data:
Col1
Col2
A
X
A
Y
A
Z
B
X
B
Y
B
Z
C
Z
I need to receive the following result:
Col1
Col2
A
X
B
Y
C
Z
In other words: For each value in the left column, I need to ...
0
votes
0
answers
64
views
Polars bug using windowed aggregate functions on Decimal type columns
Windowed aggregate functions on Decimal-types move decimals to integers
I found a bug in polars (version 1.21.0 in a Python 3.10.8 environment) using windowed aggregate functions. They are not ...
3
votes
1
answer
117
views
Why `.first()`, and why before `.over()`, in `with_columns` expression function composition chain
new to Polars, seeking help understanding why part of the function composition for the expression in the .with_columns() snippet below has to be done in that particular order.
Specifically, I don't ...
2
votes
2
answers
69
views
Compute group-wise residual for polars data frame
I am in a situation where I have a data frame with X and X values as well as two groups GROUP1 and GROUP2. Looping over both of the groups, I want to fit a linear model against the X and Y data and ...
0
votes
1
answer
54
views
BigQuery get rolling average of variable 1 if variable 2 >= quantile
Say I want to get the rolling average of variable x where a second variable y is in the top 5th percentile (over that window).
I can get the rolling average alone with something like this
SELECT
...
1
vote
1
answer
43
views
How to calculate the maximum drawdown of a stock over a rolling time window?
In quantitative finance, maximum drawdown is a key risk metric that measures the largest decline from a peak to a trough over a period.
I want to calculate the maximum drawdown over the past 10 ...
2
votes
1
answer
199
views
Find corresponding date of max value in a rolling window of each partition
Sample code:
import polars as pl
from datetime import date
from random import randint
df = pl.DataFrame({
"category": [cat for cat in ["A", "B"] for _ in range(1, ...
1
vote
1
answer
135
views
Get a grouped sum in polars, but keep all individual rows
I am breaking my head over this probably pretty simply question and I just can't find the answer anywhere. I want to create a new column with a grouped sum of another column, but I want to keep all ...
1
vote
1
answer
59
views
Group-By column in polars DataFrame inside with_columns
I have the following dataframe:
import polars as pl
df = pl.DataFrame({
'ID': [1, 1, 5, 5, 7, 7, 7],
'YEAR': [2025, 2025, 2023, 2024, 2020, 2021, 2021]
})
shape: (7, 2)
┌─────┬──────┐
│ ID ┆ ...
1
vote
1
answer
78
views
In PostgreSQL do ranking window functions heed the window frame or act on the entire partition?
I am learning window functions, primarily with this page of the docs. I am trying to categorize the window functions according to whether they heed window frames, or ignore them and act on the ...
3
votes
2
answers
81
views
How to filter sequential event data according to whether record is followed by specific event within X minutes?
I have some data with a timestamp column t, an event category column cat, and a user_id column. cat can take n values, including value A.
I want to select records which are followed (not necessarily ...
1
vote
1
answer
71
views
Median with a sliding window
The goal is to use MEDIAN as a window function with a sliding window of a specific size.
SELECT *,
MEDIAN(n) OVER(ORDER BY id ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
FROM test_data
ORDER BY id;...
1
vote
2
answers
87
views
How to get the max amount per day for a month
I have a table with two columns: demo at db<>fiddle
create table your_table("Date","Count")as values
('2022-01-13'::date, 8)
,('2022-01-18'::date, 14)
,('2022-01-25'::...
2
votes
2
answers
73
views
Identify duplicates within a period of time using Redshift SQL
In a table, I have plan details of customers with their customer_id and enroll_date.
Now, I want to identify duplicate and valid enrollments from the overall data.
Duplicate: If a customer enrolls a ...
1
vote
1
answer
134
views
How to Exclude Rows Based on a Dynamic Condition in a PySpark Window Function?
I am working with PySpark and need to create a window function that calculates the median of the previous 5 values in a column. However, I want to exclude rows where a specific column feature is True. ...
1
vote
1
answer
62
views
MySQL filtered gaps and islands: avoiding temporaries and filesorts?
CREATE TABLE `messages` (
`ID` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
`Arrival` TIMESTAMP NOT NULL,
`SenderID` INT UNSIGNED NOT NULL,
-- Fields describing messages skipped
PRIMARY ...
0
votes
1
answer
55
views
Sum Time Differences over multiple groups in MySQL
I have a table in MySQL...
# id, admin_id, appointment_id, timestamp
'1', '10', '1', '2025-03-01 08:00:00'
'2', '10', '1', '2025-03-01 09:00:00'
'3', '10', '2', '2025-04-01 08:00:00'
'4', '10', '2', '...
1
vote
1
answer
85
views
Aggregate 3-month rolling dates with overlap [closed]
Suppose I have below dataset:
date Value
01-Jul-24 37
01-Aug-24 76
01-Sep-24 25
01-Oct-24 85
01-Nov-24 27
01-Dec-24 28
And I want to aggregate by 3 months rolling:...
0
votes
1
answer
62
views
segmented monthly snapshots of validly eligible user counts
I've been trying to figure out a SQL (in postgresql) query for a cohort-type analysis at work and can't figure this one out for the life of me.
I need a snapshot count of the number of valid users at ...
1
vote
2
answers
163
views
How can I perform a calculation on a rolling window over a partition in polars?
I have a Dataset containing GPS Coordinates of a few planes. I would like to calculate the bearing of each plane at every point in time.
The Dataset as among others these columns:
event_uid
plane_no
...
1
vote
1
answer
87
views
How to conditionally choose which column to backward fill over in polars?
I need to backfill a column over one of three possible columns, based on which one matches the non-null cell in the column to be backfilled.
My dataframe looks something like this:
import polars as pl
...
4
votes
1
answer
109
views
Grouped Rolling Mean in Polars
Similar question is asked here
However it didn't seem to work in my case.
I have a dataframe with 3 columns, date, groups, prob. What I want is to create a 3 day rolling mean of the prob column values ...
-1
votes
1
answer
114
views
How can I apply a filter to a window function in Snowflake?
Suppose I have a table like this
TRANSACTION_DATE
BOOKED_DATE
AMOUNT
2024-02-10
2024-02-09
50
2024-02-10
2024-02-10
50
2024-02-10
2024-02-11
50
2024-02-11
2024-02-10
50
2024-02-11
2024-02-11
50
2024-...
1
vote
2
answers
106
views
SQL Window Functions - Pivot on a Column
I have a table data as show below.
cust_id
city_type
city_name
start_date
1
physical
Las Vegas
5/17/2024
1
office
Seattle
5/17/2024
1
office
Dallas
9/20/2024
1
physical
Dallas
10/30/2024
1
office
...
3
votes
2
answers
101
views
How to count people inside the building using entrance/leaving logs in PostgreSQL
I have a table with logs of going inside and outside the building. The table looks like that:
user_id
datetime
direction
1
17/2/2025, 18:25:02.000
in
1
17/2/2025, 20:09:10.000
out
2
17/2/2025, 09:55:...
0
votes
3
answers
120
views
Oracle Max Over Partition By Excluding Current Row
I have an issue to calculate the max() value over partition by where i want to exclude the current row each time.
Assuming I have a table with ID, Group and Valbue.
calculating max/min/etc. over ...
1
vote
0
answers
73
views
pivot vs window in spark
I have the following requirement
Pivot the dataframe to sum amount column based on document type
Join the pivot dataframe back to the original dataframe to get additional columns
Filter the joined ...
8
votes
1
answer
414
views
Replacing window function OVER() with WINDOW clause reference yields different results
While preparing an answer to another question here, I coded up a query that contained multiple window functions having the same OVER(...) clause. Results were as expected.
select ...
sum(sum(s....
2
votes
3
answers
265
views
Stratified sampling using SQL given an absolute sample size
I have the following population:
a
b
b
c
c
c
c
I am looking for a SQL statement to generate a the stratified sample of arbitrary size. Let's say for this example, I would like a sample size of 4. I ...
1
vote
1
answer
101
views
Update every N rows with an increment of 1
I am an SQL server developer working on a project in a PostgreSQL environment. I am having some PostgreSQL syntax issues. I am working off version 9.3.
In a given table, I am trying to set every 10 ...
0
votes
0
answers
69
views
How to pick change values from a column other than using window functions in Snowflake
I have a Snowflake table with data like below:
Table1
Col1 Col2 Col3
G1 1 9:15
G1 1 9:16
G1 2 9:17
G1 1 9:18
G2 1 9:15
G2 2 9:16
I want to ...
0
votes
0
answers
16
views
Is there a good way to add columns calculated using a window partition within pandas chaining
My background is in SQL and I was wondering what was the most efficient/readable way of creating multiple columns using the same window partition within a pandas chain.
Suppose I have the following ...
0
votes
2
answers
69
views
Calculate Date Difference for Non-Consecutive Months
I am trying to find gaps in enrollment and have a table set up like this:
ID
Enrollment _Month
Consecutive_Months
1
202403
1
1
202404
2
1
202405
3
1
202409
1
1
202410
2
1
202411
3
2
202401
1
2
202402
...
0
votes
1
answer
64
views
spark scala ignore nulls in windowing clause
In spark SQl, you can write
SELECT title, rn,
lead(rn, 1) IGNORE NULLS over(order by rn) as next_rn
FROM my_table
;
How would you add the IGNORE NULLS part in the equivalent Scala code?
val ...
0
votes
3
answers
73
views
Most recent status of each item as of the 1st of each month
I have a table that is structured in the following way: fiddle
create table test(id,status,datechange)as values
('011AVN', 11, '2024-06-21 08:27:13'::timestamp)
,('011AVN', 12, '2024-06-21 08:28:16')
...
0
votes
3
answers
200
views
Using SQL Server window functions with year and month(Period of time)
Please consider this script:
Declare @tbl Table
(
F1 int,
F2 int,
Year int,
Month tinyint
)
Insert into @tbl
values
(10, 1, 2020, 1),
(10, 1, 2020, 2),
(10, 1, 2020, 3),
(10, ...
-2
votes
1
answer
75
views
How to rank rows considering ties?
How to show numbers 1 1 3 4 5 5 7... in PostgreSQL query
Example:
create table test(name,sum_all)as values
('a',100)
,('b',95)
,('c',100)
,('d',75)
,('e',55);
Desired results
name
sum_all
...
1
vote
1
answer
104
views
How to assign unique UUIDs to groups of metrics in a PostgreSQL table with repeated names?
I’m working with a PostgreSQL table that stores metric data for different assets. The table currently has over 1 billion records.
Each update will have multiple metrics, e.g., speed, distance, ...
0
votes
1
answer
62
views
Copy an ID to rows with adjacent timestamp ranges sharing a class
Here is a sample table.
create table test(ID,Start_date_time,End_date_time,class)
as values
(131, '5/26/2021 11:42', '5/26/2021 12:42', 'AAA')
,(132, '5/26/2021 12:42', '5/26/2021 13:18', 'AAA')...
1
vote
1
answer
75
views
Sum a Column of a Timeseries by an Order Number when the Ordernumber is not unique
I have a table like this: demo at db<>fiddle
CREATE TABLE test(id, order_id, start, end1, count) AS VALUES
(1, 1, '2023-12-19 10:00:00'::timestamp, '2023-12-19 11:00:00'::timestamp, 15),
(2, 1, '...
1
vote
5
answers
135
views
Streak for a given endDate SQL (Postgres)
Input data
date
number
2024-11-02
1000
2024-11-03
500
2024-11-05
1000
2024-11-06
1000
2024-11-07
1000
2024-11-08
500
2024-11-14
1000
2024-11-15
1000
for a given date I want to get the streak (dates ...
1
vote
4
answers
117
views
Number of missing periods between dates
I'm using Postgres and I would like to find missing ranges of dates. I've got this table with these data :
create table event_dates(date)AS VALUES('2024-12-09'::date)
...
2
votes
2
answers
94
views
How can I force a WINDOW function in MySQL to show 'NULL' unless complete window frame is available?
I want to get moving sum and moving average on each date for last 7 days (including current day). I used WINDOW function and used ROWS BETWEEN to frame the function which calculates correctly, but it ...
2
votes
1
answer
80
views
How to calculate average stock status in day
Stock status for days is in table
create table stockstatus (
stockdate date not null, -- date of stock status
product character(60) not null, -- product id
status int not null, -- stock status in ...