1

given the following table:

    create table #T
    (
    user_id int,
    project_id int,
    datum datetime,
    status varchar(10),
    KM int
    )

insert into #T values 
    (1, 1, '20160301 10:25', 'START', 1000), 
    (1, 1, '20160301 10:28', 'PASS', 1008), 
    (2, 2, '20160301 10:29', 'START', 2000),
    (1, 1, '20160301 11:08', 'STOP', 1045), 
    (3, 3, '20160301 10:25', 'START', 3000),
    (2, 2, '20160301 10:56', 'STOP', 2020), 
    (1, 4, '20160301 15:00', 'START', 1045), 
    (4, 5, '20160301 15:10', 'START', 400), 
    (1, 4, '20160301 15:10', 'PASS', 1060), 
    (1, 4, '20160301 15:20', 'PASS', 1080), 
    (1, 4, '20160301 15:30', 'STOP', 1080), 
    (4, 5, '20160301 15:40', 'STOP', 450),
    (3, 3, '20160301 16:25', 'STOP', 3200)

I have to sum the length of a track between START and STOP statuses for a given user and project The expected result would be this:

    user_id     project_id  datum      TOTAL_KM
----------- ----------- ---------- -----------
1           1           2016-03-01 45
1           4           2016-03-01 35
2           2           2016-03-01 20
3           3           2016-03-01 200
4           5           2016-03-01 50

How can I achieve this without using a cluster? The performance is an issue (I have over 1 million records per month and we have to keep data for several years)

Explanation: We can ignore the records with the status "PASS". Basically we have to subtract the KM value of the START record from the STOP record for a given user and project. There can be several hundred records between a START and STOP (like describes in the sample data) The date should be the date of START (in case where we have an over midnight delivery)

I think I should have a SELECT with an OVER() clause but I don't know how to formulate my query to respect those conditions.

Any idea?

1
  • What version of SQL Server are you using? Commented Mar 31, 2016 at 19:32

2 Answers 2

2
SELECT  t.[user_id],
        t.project_id,
        cast(t.datum as date) as datum,
        t1.KM- t.KM as KM
FROM #T t
INNER JOIN #T t1 
    ON t.[user_id]=t1.[user_id] and t.project_id = t1.project_id 
WHERE t.[status] = 'START' and t1.[status] = 'STOP'
ORDER BY t.[user_id],
        t.project_id,
        cast(t.datum as date)

Output:

user_id     project_id  datum      KM
----------- ----------- ---------- -----------
1           1           2016-03-01 45
1           4           2016-03-01 35
2           2           2016-03-01 20
3           3           2016-03-01 200
4           5           2016-03-01 50

(5 row(s) affected)
Sign up to request clarification or add additional context in comments.

1 Comment

If you are only looking for completed trips, it's probably better to do an inner join.
0

This could be achieved by simple self join.

One of the example: (this may not be exact query but just an idea)

Select 
    a.user_id,
    a.project_id,
    b.datum as StartDate,
    a.KM-b.KM as TotalKM
From #T a
Where status = 'STOP'
Join 
(
    Select user_id, project_id, KM From #t Where 
    status = 'START'
) b ON b.user_id = a.user_id, b.project_id = a.project_id
#T b

2 Comments

This is not a working SQL query in none of the database dialects. :-(
@Gabor as i said already it was a conceptual Idea not exact work result you wanted, so point of Devoting just because you are expecting exact query.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.