I have a tabler i will call data_rows like this:
create table if not exists data_rows
(
id integer not null,
constraint data_rows_to_group
primary key (id),
date date not null,
group_id int,
--more fields that are not relevant
);
When i order the rows by date i want the rows to have a new group_id if the date difference to the preceding row is >7 days (can be another time_span but lets keep it at 7 days) So all rows that have the same group_id when ordered by date have a date differences <= 7 days. For example:
id date group id
1 12.01.2019 0
2 15.01.2019 0
3 21.01.2019 0
4 05.02.2019 1
5 08.02.2019 1
6 20.02.2019 2
7 30.02.2019 3
8 30.02.2019 3
(Especially 1 and 3 are in the same group although they have a difference >7 but in the group two following rows have no difference >7)
I know how to do this in python or c# or similar languages in a procedural way. But it would be very useful if i could do this on the postgresql server because it is a lot of data and it keeps it to a single point of failure too and it would be a big learning experience too.
Here is how i would do it in c# so you get the idea of what i want:
using System;
using System.Collections.Generic;
using System.Linq;
class DataRows
{
public int Id { get; set; }
public DateTime Date { get; set; }
public int GroupId { get; set; }
}
class GroupMarking
{
public DataRows[] RowsWithGroupIds(IEnumerable<DataRows> relevantDataRows, TimeSpan betweenSpan)
{
var currentGroupId = 0;
var rows = relevantDataRows.OrderBy(p => p.Date).ToArray();
rows[0].GroupId = currentGroupId;
for (var i = 1; i < rows.Length; i++)
{
if (rows[i].Date -
rows[i - 1].Date >= betweenSpan)
{
currentGroupId++;
}
rows[i].GroupId = currentGroupId;
}
return rows;
}
}
Is this possible in postgresql? I know there are Loops in Postgres. I prefer a solution without loops but if its not possible without they are ok. How do i create the ids int the group_id column without falling back on a procedural language?
date_spanargument. Then you will try to add the rule for groups of no less than 2 rows. Then the next rule ... until you come up with a final SQL query. If it is impossible to build such a query - then you can simply write an imperative SQL procedure directly translating C# to SQL.date_span- at least to prevent ambiguity like this:If there are 3 rows in sequence and row 2 is within the date_span relative to both row 1 and row 3 - then which of these 2 groups should we put row 2 in ?Implementation comes from the definition.30.02.2019is an invalid date