Understanding Rolling Window Counts with SQL
In this article, we will delve into the world of rolling window counts in SQL. Specifically, we’ll explore how to calculate counts based on a 90-day window per unique ID. This problem can be challenging due to the need for complex date calculations and counting logic.
Problem Statement
The problem involves a table with id and date columns, where multiple transactions can occur within a 90-day window. The goal is to create new columns (window and count) that display the start of each 90-day window and the count of transactions occurring during those windows.
Current Approach
The original question suggests using a combination of CASE, MIN(date) OVER (PARTITION BY id), and DATEDIFF(day, date, MIN(date) OVER (PARTITION BY id)) to calculate the window column. It also proposes using ROW_NUMBER() OVER (PARTITION BY id, window) to increment the count.
However, this approach is limited due to the complexities of SQL window functions and the need for recursive logic.
Recursive Query Solution
The provided answer outlines a recursive query solution, which is more suitable for solving this problem. This approach involves two key steps: ranking records by date within each ID partition and then traversing the data set to compute the starting date of each group.
Step 1: Ranking Records by Date
with
tab as (
select t.*, row_number() over(partition by id order by date) rn
from mytable t
)
In this step, we rank records having the same id by increasing date. This is done using the ROW_NUMBER() function with a partitioning clause that groups rows by id and orders them by date.
Step 2: Traversing the Data Set
cte as (
select id, date, rn, date date0 from tab where rn = 1
union all
select t.id, t.date, t.rn, greatest(t.date, c.date + interval '90' day)
from cte c
inner join tab t on t.id = c.id and t.rn = c.rn + 1
)
Here, we define a recursive CTE (Common Table Expression) that starts with the first record for each id partition. Then, we traverse the data set by joining the CTE with the original table, keeping only rows where the rn value is one more than the previous row’s rn. We calculate the starting date of each group as the maximum of the current date and 90 days ago from the start of the previous group.
Final Step: Counting Transactions
select
id,
date,
dense_rank() over(partition by id order by date0) grp,
count(*) over(partition by id order by date0, date) cnt
from cte
In this final step, we rank the records within each id partition by their start date (date0). We then calculate the total count of transactions within each 90-day window using a subquery with an outer clause.
Understanding the Recursive Query
The recursive query is composed of three main components:
- Ranking Records: The first part ranks records by
idanddate, creating a sequence of rows for eachid. - Traversing the Data Set: The second part traverses this ranked data set, computing the start date of each group (i.e., the first day of the next 90-day window).
- Counting Transactions: The final step counts the number of transactions within each group.
By combining these components, we obtain a comprehensive solution that accurately calculates rolling window counts for unique IDs.
Example Use Case
Suppose we have a table mytable with the following data:
| id | date |
|---|---|
| 1 | 2019-07-01 |
| 1 | 2019-12-31 |
| 1 | 2020-01-23 |
| 1 | 2020-02-12 |
| 1 | 2020-04-01 |
| 2 | 2019-06-30 |
| 2 | 2019-08-14 |
Using the recursive query solution outlined above, we can calculate the rolling window counts for each id as follows:
with
tab as (
select id, date, row_number() over(partition by id order by date) rn
from mytable
)
,
cte as (
select id, date, rn, date date0
from tab
where rn = 1
union all
select t.id, t.date, t.rn, greatest(t.date, cte.date0 + interval '90' day)
from cte c
inner join tab t on t.id = c.id and t.rn = c.rn + 1
),
cte2 as (
select
id,
date,
dense_rank() over(partition by id order by date0) grp,
count(*) over(partition by id order by date0, date) cnt
from cte
)
select * from cte2
The resulting CTE cte2 will contain the calculated rolling window counts for each id, including the start of each group and the total transaction count within each 90-day window.
Conclusion
In this article, we explored a complex problem involving rolling window counts in SQL. We introduced a recursive query solution that accurately calculates these counts by ranking records by date within each ID partition and then traversing the data set to compute the start date of each group. By applying this approach, you can efficiently solve similar problems in your own projects.
Note: The provided code snippets are written in PostgreSQL syntax and may need modifications for other SQL dialects.
Last modified on 2025-03-05