0

I am trying to get the count of entries by users grouped by year, month and user name from a table which has 45M entries. The query result has around 4M records which I wasn't able to get in one go so I decided to use limit and offset.

To retrieve the first 1M records I've written the query below:

select SQL_BIG_RESULT uis.nick, uis.user_id, CONCAT(t.year, '-', LPAD(t.month, 2, 0)) AS DATE, t.count 
from (select SQL_BIG_RESULT e.user_id, YEAR(e.created_at) as year, MONTH(e.created_at) as month, COUNT(*) AS count
        from entries e
        group by YEAR(e.created_at), MONTH(e.created_at), e.user_id
        limit 1000000
     ) t
inner join users u on u.id = t.user_id
inner join user_infos ui on ui.user_id = u.id
inner join user_identifiers uis on uis.user_info_id = ui.id
order by t.year, t.month, uis.nick;

To retrieve the second 1M records I've set an offset of 999998 so I would have 2 overlapping rows so that I could double check that it's correct, hence this query below:

select SQL_BIG_RESULT uis.nick, uis.user_id, CONCAT(t.year, '-', LPAD(t.month, 2, 0)) AS DATE, t.count 
from (select SQL_BIG_RESULT e.user_id, YEAR(e.created_at) as year, MONTH(e.created_at) as month, COUNT(*) AS count
        from entries e
        group by YEAR(e.created_at), MONTH(e.created_at), e.user_id
        limit 999998, 1000000
     ) t
inner join users u on u.id = t.user_id
inner join user_infos ui on ui.user_id = u.id
inner join user_identifiers uis on uis.user_info_id = ui.id
order by t.year, t.month, uis.nick;

Then to compare the results and double check, I've got the tail of the first 1M records and the head of the second 1M records. There should be 2 overlapping records in my understanding -since I've used an offset of 999998- but there is something wrong.

It's also evident that there is something wrong with the query because the first file ends with zzzzz but then the second file starts with 0 3 kalem ucu which should not be after z in alphabetical order.

$ tail entry_counts_by_users_1_1m.csv

| user_nick   | user_id | date    | entry_count | 
|-------------|---------|---------|-------------| 
| zskal       | 493395  | 2013-05 | 8           | 
| zuhanzee    | 397659  | 2013-05 | 2           | 
| zulmet      | 446672  | 2013-05 | 74          | 
| zuluuuuuu   | 1240043 | 2013-05 | 9           | 
| zverkov     | 502616  | 2013-05 | 2           | 
| zvezdite    | 750458  | 2013-05 | 1           | 
| zx          | 249598  | 2013-05 | 15          | 
| zyprexa 5mg | 779519  | 2013-05 | 16          | 
| zzgx        | 584985  | 2013-05 | 2           | 
| zzzzz       | 22730   | 2013-05 | 1           | 
$ head entry_counts_by_users_1m_2m.csv

| nick          | user_id | DATE    | count | 
|---------------|---------|---------|-------| 
| 0 3 kalem ucu | 624699  | 2013-05 | 4     | 
| 0132          | 995914  | 2013-05 | 3     | 
| 03072010      | 960606  | 2013-05 | 9     | 
| 0312020008    | 804486  | 2013-05 | 2     | 
| 0326          | 446816  | 2013-05 | 1     | 
| 05            | 575534  | 2013-05 | 1     | 
| 05012009      | 1171153 | 2013-05 | 6     | 
| 0904          | 514964  | 2013-05 | 2     | 
| 0kmzeka       | 777191  | 2013-05 | 4     | 

Could you help me understand what I am doing wrong here?

+-----------+
| @@version |
+-----------+
| 8.0.19    |
+-----------+

UPDATE

These are the results I get after using ORDER BY in my subquery:

select SQL_BIG_RESULT uis.nick, uis.user_id, CONCAT(t.year, '-', LPAD(t.month, 2, 0)) AS DATE, t.count
    from (select SQL_BIG_RESULT e.user_id, YEAR(e.created_at) as year, MONTH(e.created_at) as month, COUNT(*) AS count
            from entries e
            group by YEAR(e.created_at), MONTH(e.created_at), e.user_id
            order by year, month, user_id
            limit 1000000) t
    inner join users u on u.id = t.user_id
    inner join user_infos ui on ui.user_id = u.id
    inner join user_identifiers uis on uis.user_info_id = ui.id

For the first 1M records:

$ tail entry_counts_by_users_1_1m.csv

| user_name                  | user_id | date    | entry_count | 
|----------------------------|---------|---------|-------------| 
| statistic er               | 667546  | 2012-06 | 1           | 
| mula                       | 612905  | 2013-02 | 1           | 
| sisman cirkin bi de kezban | 1327434 | 2013-02 | 2           | 
| tyra34                     | 1329280 | 2013-03 | 1           | 
| ecemazkan                  | 1332628 | 2013-02 | 1           | 
| susamlicubuk               | 1333079 | 2013-02 | 1           | 
| hemenhemenherterim         | 631784  | 2011-04 | 1           | 
| umursamaz tavrin hastasi   | 1060158 | 2012-09 | 2           | 
| uslucocuk                  | 1254758 | 2012-09 | 1           | 
| dharamsala                 | 956110  | 2012-09 | 1           | 
select SQL_BIG_RESULT uis.nick, uis.user_id, CONCAT(t.year, '-', LPAD(t.month, 2, 0)) AS DATE, t.count
    from (select SQL_BIG_RESULT e.user_id, YEAR(e.created_at) as year, MONTH(e.created_at) as month, COUNT(*) AS count
            from entries e
            group by YEAR(e.created_at), MONTH(e.created_at), e.user_id
            order by year, month, user_id
            limit 999998, 1000000) t
    inner join users u on u.id = t.user_id
    inner join user_infos ui on ui.user_id = u.id
    inner join user_identifiers uis on uis.user_info_id = ui.id

For the second 1M records:

$ head entry_counts_by_users_1m_2m.csv

| user_name | user_id | date    | entry_count | 
|-----------|---------|---------|-------------| 
| ssg       | 8097    | 2013-06 | 101         | 
| ssg       | 8097    | 2013-07 | 73          | 
| ssg       | 8097    | 2013-08 | 100         | 
| ssg       | 8097    | 2013-09 | 88          | 
| ssg       | 8097    | 2013-10 | 84          | 
| ssg       | 8097    | 2013-11 | 54          | 
| ssg       | 8097    | 2013-12 | 64          | 
| ssg       | 8097    | 2014-01 | 78          | 
| ssg       | 8097    | 2014-02 | 31          | 

I still don't get what I am doing wrong.

12
  • What version of MySQL are you using? Commented Jul 18, 2020 at 11:43
  • I am using the version 8.0.19 Commented Jul 18, 2020 at 11:45
  • None of your queries order primarily by nick, so why do you expect your result files to be ordered by nick? Your limit query does not order by nick at all. Commented Jul 18, 2020 at 11:47
  • 1
    @Shadow group by implicitly orders as well not in MySql 8.0: db-fiddle.com/f/paQr4yCZBHbYJHKwJvSofd/0 Commented Jul 18, 2020 at 12:07
  • 2
    I understand that in the absence of ORDER BY then engine will return the rows in any order (though @Shadow disagrees with this statement). If the rows are returned in any order, then LIMIT won't work as you expect. I would suggest you add the ORDER BY and this problem may disappear; ...and it's very cheap to do. Commented Jul 18, 2020 at 12:07

1 Answer 1

3

Starting in MySQL 8.0.13, implicit ordering for GROUP BY has been removed:

Incompatible Change: The deprecated ASC or DESC qualifiers for GROUP BY clauses have been removed. Queries that previously relied on GROUP BY sorting may produce results that differ from previous MySQL versions. To produce a given sort order, provide an ORDER BY clause.

The implicit ordering has been deprecated since 5.6, so there has been some warning.

Your subquery is using GROUP BY with no ORDER BY. The ordering of the result set is not specified and it might change from one run to the next. To produce a stable result, using an ORDER BY before the LIMIT.

Sign up to request clarification or add additional context in comments.

11 Comments

Ouch, one more thing to watch out for.
I've updated my question with the ORDER BY statement added to my subquery. Can you check it, please?
But the order by is already before the limit, isn't it?
@AndréYuhai . . . Sorry, I was looking at the wrong query.
After trying a few combinations of it I've finally got the correct results, haha. Thank you.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.