1

I have the following database structure:

Sites table

id  |  name  |  other_fields

Backups table

id  |  site_id  | initiated_on(unix timestamp) | size(float) | status

So Backups table have a Many to One relationship with Sites table connected via site_id


And I would like to output the data in the following format

name | Latest initiated_on | status of the latest initiated_on row

And I have the following SQL query

SELECT *, `sites`.`id` as sid, SUM(`backups`.`size`) AS size
FROM (`sites`)
LEFT JOIN `backups` ON `sites`.`id` = `backups`.`site_id`
WHERE `sites`.`id` =  '1'
GROUP BY `sites`.`id`
ORDER BY `backups`.`initiated_on` desc

The thing is, with the above query I can achieve what I am looking for, but the only problem is I don't get the latest initiated_on values.

So if I had 3 rows in backups with site_id=1, the query does not pick out the row with the highest value in initiated_on. It just picks out any row.

Please help, and

thanks in advance.

4
  • 1
    You shouldn't write SELECT * when you have a GROUP BY - the values that you'll get back will be arbitrary. In most DBs this wouldn't even be a valid SQL statement. (And even when you aren't using GRUOP BY, SELECT * is still considered to be a bad practice). Commented Feb 10, 2012 at 14:55
  • can I change it to SELECT sites.* ? Commented Feb 10, 2012 at 15:00
  • You haven't added initiated_on to your SELECT which is probably why it's not returning it. Commented Feb 10, 2012 at 15:26
  • you may use max(initiated_on) to get the maximum on a group by Commented Feb 10, 2012 at 15:27

3 Answers 3

2

You should try:

SELECT sites.name, FROM_UNIXTIME(b.latest) as latest, b.size, b.status
FROM sites
LEFT JOIN
  ( SELECT bg.site_id, bg.latest, bg.sizesum AS size, bu.status
    FROM
      ( SELECT site_id, MAX(initiated_on) as latest, SUM(size) as sizesum
        FROM backups
        GROUP BY site_id ) bg
    JOIN backups bu
    ON bu.initiated_on = bg.latest AND bu.site_id = bg.site_id
  ) b
ON sites.id = b.site_id
  1. In the GROUP BY subquery - bg here, the only columns you can use for SELECT are columns that are either aggregated by a function or listed in the GROUP BY part.

    http://dev.mysql.com/doc/refman/5.5/en/group-by-hidden-columns.html

  2. Once you have all the aggregate values you need to join the result again to backups to find other values for the row with latest timestamp - b.

  3. Finally join the result to the sites table to get names - or left join if you want to list all sites, even without a backup.

Sign up to request clarification or add additional context in comments.

3 Comments

This is the most efficient solution but you should also add "bu.site_id = bg.site_id" to the innermost join
Yes you are right, I assumed int timestamp was unique enough, edited.
Thank you, well explained and the query is efficient in my crude benchmarking.
1

Try with this:

select S.name, B.initiated_on, B.status
from sites as S left join backups as B on S.id = B.site_id
where B.initiated_on = 
       (select max(initiated_on)
           from backups
          where site_id = S.id)

Comments

1

To get the latest time, you need to make a subquery like this:

    SELECT sites.id as sid, 
           SUM(backups.size) AS size
           latest.time AS latesttime
      FROM sites AS sites
 LEFT JOIN (SELECT site_id, 
                   MAX(initiated_on) AS time
              FROM backups
          GROUP BY site_id) AS latest
        ON latest.site_id = sites.id
 LEFT JOIN backups 
        ON sites.id = backups.site_id
     WHERE sites.id =  1
  GROUP BY sites.id
  ORDER BY backups.initiated_on desc

I have removed the SELECT * as this will only work using MySQL and is generally bad practice anyway. Non-MySQL RDBSs will throw an error if you include the other fields, even individually and you will need to make this query itself into a subquery and then do an INNER JOIN to the sites table to get the rest of the fields. This is because they will be trying to add all of them into the GROUP BY statement and this fails (or is at least very slow) if you have long text fields.

1 Comment

Oops! That was what I had in mind. Edited to use the right column name.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.