We have a mysql transaction table where every run registration attempt (paystatus = completed, pending, incomplete, etc) is recorded with a bunch of other data. I want to create additional columns in the same table and update them with aggregate values calculated from previous rows (e.g - same runner can come to multiple runs, get aggregate of previous run records for same person into current record).
Unique columns -
- regid - primary key
- combination of email+firstname (not exactly accurate but we use this assumption for a unique person)
Source table structure (sqlfiddle - http://sqlfiddle.com/#!9/ff191d/6) and example data
source example data -
+---------------------------------------------------------------------------------------------------------------+
| regid | registeredon | paystatus | firstname | email | amountpaid | totalruns | totalamount | influencer |
+---------------------------------------------------------------------------------------------------------------+
| a2 | 25-05-2018 | completed | jay | [email protected] | 100 | 0 | 0 | 0 |
| z2 | 14-03-2018 | completed | jay | [email protected] | 100 | 0 | 0 | 0 |
| y2 | 23-02-2018 | completed | vijay | [email protected] | 150 | 0 | 0 | 0 |
| y6 | 21-02-2018 | pending | vijay | [email protected] | 0 | 0 | 0 | 0 |
| x2 | 14-01-2018 | completed | vijay | [email protected] | 100 | 0 | 0 | 0 |
| x6 | 11-01-2018 | pending | vijay | [email protected] | 100 | 0 | 0 | 0 |
| a1 | 23-05-2018 | completed | ajay | [email protected] | 120 | 0 | 0 | 0 |
| a5 | 21-05-2018 | pending | ajay | [email protected] | 0 | 0 | 0 | 0 |
| z1 | 04-03-2018 | completed | ajay | [email protected] | 100 | 0 | 0 | 0 |
| z5 | 02-03-2018 | pending | ajay | [email protected] | 0 | 0 | 0 | 0 |
| y1 | 03-02-2018 | completed | ajay | [email protected] | 150 | 0 | 0 | 0 |
| x1 | 02-01-2018 | completed | ajay | [email protected] | 100 | 0 | 0 | 0 |
| x5 | 01-01-2018 | pending | ajay | [email protected] | 0 | 0 | 0 | 0 |
+---------------------------------------------------------------------------------------------------------------+
The aggregate value columns to be calculated are (count only successful registrations for all i.e paystatus = completed)
- totalruns - Total runs runner has participated in, including current one.
- totalamount - Sum of all amount runner has paid for current and previous runs. Total amount runner has paid till that run including current row.
- influencer - count of successful registrations for that email (not including firstname like above)
Expected result data after calculations
+---------------------------------------------------------------------------------------------------------------+
| regid | registeredon | paystatus | firstname | email | amountpaid | totalruns | totalamount | influencer |
+---------------------------------------------------------------------------------------------------------------+
| a2 | 25-05-2018 | completed | jay | [email protected] | 100 | 2 | 200 | 1 |
| z2 | 14-03-2018 | completed | jay | [email protected] | 100 | 1 | 100 | 1 |
| y2 | 23-02-2018 | completed | vijay | [email protected] | 150 | 2 | 250 | 2 |
| y6 | 21-02-2018 | pending | vijay | [email protected] | 0 | 1 | 100 | 1 |
| x2 | 14-01-2018 | completed | vijay | [email protected] | 100 | 1 | 100 | 1 |
| x6 | 11-01-2018 | pending | vijay | [email protected] | 100 | 1 | 100 | 0 |
| a1 | 23-05-2018 | completed | ajay | [email protected] | 120 | 4 | 470 | 4 |
| a5 | 21-05-2018 | pending | ajay | [email protected] | 0 | 3 | 350 | 3 |
| z1 | 04-03-2018 | completed | ajay | [email protected] | 100 | 3 | 350 | 3 |
| z5 | 02-03-2018 | pending | ajay | [email protected] | 0 | 2 | 250 | 2 |
| y1 | 03-02-2018 | completed | ajay | [email protected] | 150 | 2 | 250 | 2 |
| x1 | 02-01-2018 | completed | ajay | [email protected] | 100 | 1 | 100 | 1 |
| x5 | 01-01-2018 | pending | ajay | [email protected] | 0 | 0 | 0 | 0 |
+---------------------------------------------------------------------------------------------------------------+
Explanation - totalamount, totalruns, influencer for regid x5 is 0 because it was not a successful registration (paystatus = pending). x1 aggregate values are of the same record as it is the first successful registration. y1 aggregate values are sum of values for x1 and y1. z5 has aggregate values of previous successful registration, instead this can be left at 0 too (if update is run only for paystatus=completed).
I have been able to get to this query with the help of a friend, but it is not correct (sum of all records with email+firstname combination is shown instead of only older rows with that combination) -
SELECT
t1.regid, registeredon, paystatus, firstname, email,
amountpaid, SumTotalruns, SumTotalamount
FROM test t1,
(select t2.regid,
concat(t2.email, t2.firstname) tempid,
COUNT(*) SumTotalruns,
SUM(t3.amountpaid) SumTotalamount
from test t2 left join test t3
on concat(t2.email , t2.firstname) = concat(t3.email , t3.firstname)
and t2.paystatus = 'completed'
and t2.registeredon < t3.registeredon
group by concat(t2.email, t2.firstname)
) tmp
where concat(t1.email , t1.firstname) = tmp.tempid
order by id desc
Output of above query -
regid| registeredon | paystatus | firstname | email | amountpaid | SumTotalruns | SumTotalamount
a1 | 2018-05-05 | completed | ajay | [email protected] | 120 | 15 | 710
z2 | 2018-03-14 | completed | jay | [email protected] | 100 | 2 | 100
z1 | 2018-03-18 | completed | ajay | [email protected] | 100 | 15 | 710
y6 | 2018-02-21 | pending | vijay | [email protected] | 0 | 5 | 150
y2 | 2018-02-23 | completed | vijay | [email protected] | 150 | 5 | 150
y1 | 2018-02-03 | completed | ajay | [email protected] | 150 | 15 | 710
x6 | 2018-01-11 | pending | vijay | [email protected] | 100 | 5 | 150
x5 | 2018-01-01 | pending | ajay | [email protected] | 0 | 15 | 710
x2 | 2018-01-14 | completed | vijay | [email protected] | 100 | 5 | 150
x1 | 2018-01-11 | completed | ajay | [email protected] | 100 | 15 | 710
a5 | 2018-05-03 | pending | ajay | [email protected] | 0 | 15 | 710
a2 | 2018-05-23 | completed | jay | [email protected] | 100 | 2 | 100
z5 | 2018-03-15 | pending | ajay | [email protected] | 0 | 15 | 710
All these update queries will be added to a stored procedure, so if looping is better, that can be used too. Can you please help with finding the most efficient way to get values for these? Will appreciate your help!
Figured the query with the help of another friend
update test a join
(SELECT
outers.regid,
outers.registeredon, outers.paystatus, outers.firstname, outers.email,
outers.amountpaid,
sum(InnerAmount) SumTotalAmount,
count(1) SumTotalRuns
from test as outers,
(select id,
test.regid,concat(test.email, test.firstname) tempid,
test.amountpaid InnerAmount,
test.registeredon
from test
where test.paystatus = 'completed'
) inner_table
where concat(outers.email,outers.firstname) = inner_table.tempid
and outers.registeredon >= inner_table.registeredon
group by regid
) b
on a.regid=b.regid
set a.totalruns = b.SumTotalRuns,
a.totalamount = b.SumTotalAmount
But running this on the main table (total 12.5k rows and 7.6k rows with paystatus='completed') is taking 40+ seconds. Any help in optimizing this query?
CREATE TABLEstatements andINSERTstatements together with the expected result (or use an online service like sqlfiddle.com ). You will get more and better help if people don't have to struggle just to get the data in place to try a solution.concat(test.email, test.firstname) tempid- it's a possible mistake. You must add any separator not allowed in both fields to be sure Your result is correct.