1

I'm trying to optimize my query speed as much as possible. A side problem is that I cannot see the exact query speed, because it is rounded to a whole second. The query does get the expected result and takes about 1 second. The final query should be extended even more and for this reason i am trying to improve it. How can this query be improved?

The database is constructed as an electricity utility company. The query should eventually calculate an invoice. I basically have 4 tables, APX price, powerdeals, powerload, eans_power.

APX price is an hourly price, powerload is a quarterly hour volume. First step is joining these two together for each quarter of an hour.

Second step is that I currently select the EAN that is indicated in the table eans_power.

Finally I will join the Powerdeals that currently consist only of a single line and indicates from which hour, until which hour and weekday from/until it should be applicable. It consist of an hourly volume and price. Currently it is only joined on the hours, but it will be extended to weekdays as well.

MYSQL Query:

SELECT l.DATE, l.PERIOD_FROM, a.PRICE, l.POWERLOAD, 
SUM(a.PRICE*l.POWERLOAD), SUM(d.hourly_volume/4) 
FROM timeseries.powerload l 
INNER JOIN timeseries.apxprice a ON l.DATE = a.DATE 
INNER JOIN contracts.eans_power c ON  l.ean = c.ean 
LEFT OUTER JOIN timeseries.powerdeals d ON d.period_from <= l.period_from 
AND d.period_until >= l.period_until 
WHERE l.PERIOD_FROM >= a.PERIOD_FROM 
AND l.PERIOD_FROM < a.PERIOD_UNTIL 
AND l.DATE >= '2018-01-01' 
AND l.DATE <= '2018-12-31' 
GROUP BY l.date

Explain:

1   SIMPLE  c   NULL    system  PRIMARY,ean NULL    NULL    NULL    1   100.00  Using temporary; Using filesort 

1   SIMPLE  l   NULL    ref EAN EAN 21  const   35481   11.11   Using index condition

1   SIMPLE  d   NULL    ALL NULL    NULL    NULL    NULL    1   100.00  Using where; Using join buffer (Block Nested Loop)

1   SIMPLE  a   NULL    ref DATE    DATE    4   timeseries.l.date   24  11.11   Using index condition   

Create table queries:

apxprice

CREATE TABLE `apxprice` (
  `apx_id` int(11) NOT NULL AUTO_INCREMENT,
  `date` date DEFAULT NULL,
  `period_from` time DEFAULT NULL,
  `period_until` time DEFAULT NULL,
  `price` decimal(10,2) DEFAULT NULL,
  PRIMARY KEY (`apx_id`),
  KEY `DATE` (`date`,`period_from`,`period_until`)
) ENGINE=MyISAM AUTO_INCREMENT=29664 DEFAULT CHARSET=latin1 

powerdeals

CREATE TABLE `powerdeals` (
  `deal_id` int(11) NOT NULL AUTO_INCREMENT,
  `date_deal` date NOT NULL,
  `start_date` date NOT NULL,
  `end_date` date NOT NULL,
  `weekday_from` int(11) NOT NULL,
  `weekday_until` int(11) NOT NULL,
  `period_from` time NOT NULL,
  `period_until` time NOT NULL,
  `hourly_volume` int(11) NOT NULL,
  `price` int(11) NOT NULL,
  `type_deal_id` int(11) NOT NULL,
  `contract_id` int(11) NOT NULL,
  PRIMARY KEY (`deal_id`)
) ENGINE=MyISAM AUTO_INCREMENT=2 DEFAULT CHARSET=latin1 

powerload

CREATE TABLE `powerload` (
  `powerload_id` int(11) NOT NULL AUTO_INCREMENT,
  `ean` varchar(18) DEFAULT NULL,
  `date` date DEFAULT NULL,
  `period_from` time DEFAULT NULL,
  `period_until` time DEFAULT NULL,
  `powerload` int(11) DEFAULT NULL,
  PRIMARY KEY (`powerload_id`),
  KEY `EAN` (`ean`,`date`,`period_from`,`period_until`)
) ENGINE=MyISAM AUTO_INCREMENT=61039 DEFAULT CHARSET=latin1 

eans_power

CREATE TABLE `eans_power` (
  `ean` char(19) NOT NULL,
  `contract_id` int(11) NOT NULL,
  `invoicing_id` int(11) NOT NULL,
  `street` varchar(255) NOT NULL,
  `number` int(11) NOT NULL,
  `affix` char(11) NOT NULL,
  `postal` char(6) NOT NULL,
  `city` varchar(255) NOT NULL,
  PRIMARY KEY (`ean`),
  KEY `ean` (`ean`,`contract_id`,`invoicing_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

Sample data tables

apx_prices

  • apx_id,date,period_from,period_until,price
  • 1,2016-01-01,00:00:00,01:00:00,23.86
  • 2,2016-01-01,01:00:00,02:00:00,22.39

powerdeals

  • deal_id,date_deal,start_date,end_date,weekday_from,weekday_until,period_from,period_until,hourly_volume,price,type_deal_id,contract_id
  • 1,2019-05-15,2018-01-01,2018-12-31,1,5,08:00:00,20:00:00,1000,50,3,1

powerload

  • powerload_id,ean,date,period_from,period_until,powerload
  • 1,871688520000xxxxxx,2018-01-01,00:00:00,00:15:00,9
  • 2,871688520000xxxxxx,2018-01-01,00:15:00,00:30:00,11

eans_power

  • ean,contract_id,invoicing_id,street,number,affix,postal,city
  • 871688520000xxxxxx,1,1,road,14,postal,city

Result, without sum() and group by:

  • DATE,PERIOD_FROM,PRICE,POWERLOAD,a.PRICE*l.POWERLOAD,d.hourly_volume/4,
  • 2018-01-01,00:00:00,27.20,9,244.80,NULL
  • 2018-01-01,00:15:00,27.20,11,299.20,NULL

Result, with sum() and group by:

  • DATE, PERIOD_FROM, PRICE, POWERLOAD, SUM(a.PRICE*l.POWERLOAD), SUM(d.hourly_volume/4)
  • 2018-01-01,08:00:00,26.33,21,46193.84,12250.0000
  • 2018-01-02, 08:00:00,47.95,43,90623.98,12250.0000
15
  • 2
    "The query does get the expected result and takes about 1 second. Can this query be improved?" I very much doubt it returns the correct results (if it does it's pure luck) as that is not how you should use GROUP BY Commented May 22, 2019 at 20:38
  • 2
    As your query is basically wrong i would suggest Why should I provide an MCVE for what seems to me to be a very simple SQL query? for providing example data and expected results.. Commented May 22, 2019 at 20:41
  • One suggestion is to store date and time as a single entity Commented May 22, 2019 at 21:40
  • Before starting to optimize, please fix the GROUP BY. Notice that the SELECT is fetching columns (eg, price) that (apparently) depend on the hour. Commented May 22, 2019 at 21:56
  • There seems to be 1 contract. Will there be multiple contracts? Will they end on specific days? What about changing mid-day? Also what about switching to/from daylight-savings-time? Commented May 22, 2019 at 21:59

1 Answer 1

2

Preliminary optimizations:

  • Use InnoDB, not MyISAM.
  • Use CHAR only for constant-lenght strings
  • Use consistent datatypes (see ean, for example)

For an alternative to using time-to-the-second, check out the Handler counts .

Because range tests (such as l.PERIOD_FROM >= a.PERIOD_FROM AND l.PERIOD_FROM < a.PERIOD_UNTIL) are essentially impossible to optimize, I recommend you expand the table to have one entry per hour (or 1 per quarter hour, if necessary). Looking up a row via a key is much faster than doing a scan of "ALL" the table. 9K rows for an entire year is trivial.

When you get past these recommendations (and the Comments), I will have more tips on optimizing the indexes, especially InnoDB's PRIMARY KEY.

Sign up to request clarification or add additional context in comments.

11 Comments

Thanks, I will look at InnoDB. Thanks, I will start using more CHAR for fixed length strings and consistant data types. I will look into the handler counts.
The problem is that every deal is unique and may have a different start/end hour/date and may only applicable for a certain weekday. If I would apply this rule, than one deal would essentially take for example 365 days * 48 timeslot, I am not sure if this is the way to go.
what do you think?
@Introductiontoprogramming - A 1-day deal may need a different query than a 1-year deal. I would expect to need to do some code in the application to help with the optimization. (Or use a messy Stored Procedure.) My gut says (from years of experience) that you have a rather complex problem to optimize.
@Introductiontoprogramming - I'm interested in the concept, and predict that electrical utilities may someday us a granularity of a minute (not just 15m). And it will be computed inside our self-driving electric cars as an optimization in recharging. And our refrigerator, in deciding when to run the compressor.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.