what is the effect of distributed_group_by_no_merge

Question

I know that the distributed node does not combine intermediate results from shards by using distributed_group_by_no_merge.

The following SQL

select sum(xxxxx),xxxxx from (
    select sum(xxxx),xxxx 
    from (
        select count(xxx),xxx 
        from distributed_table group by xxx )  
    group by xxxx SETTINGS distributed_group_by_no_merge = 1
) group by xxxxx

I want to know that which part of sql will be sent to MergeTree node to execute by using distributed_group_by_no_merge？ is it？select count(xxx),xxx from distributed_table group by xxx ) group by xxxx SETTINGS distributed_group_by_no_merge = 1

how does the parameter of distributed_group_by_no_merge change the behavior of distributed query？which part of sql execute on MergeTree node and which part of sql execute on distributed node?

vladimir · Accepted Answer · 2024-12-06 21:34:04Z

4

distributed_group_by_no_merge-param affects the way how the initiator-node (it is a node which runs distributed query) will form the final result of a distributed query:

either by merging aggregated intermediate states coming from shards by itself (it required copying full aggregated intermediate states from shards to initiator-node) [distributed_group_by_no_merge = 0 (default mode)]
or get already final result from shards (when each shard merges an intermediate aggregation state on its side and send to initiator-node only the final result). It provides a significant improvement in performance and resource consumption but requires the right selection of the sharding key [distributed_group_by_no_merge = 1]

I would put distributed_group_by_no_merge at the same level of subquery where defined distributed table to explicitly define your intention and avoid confusion when there are several distributed-subqueries.

Let's look at the way how to check the differences between the two modes (will use _shard_num-virtual column):

distributed_group_by_no_merge=0

SELECT
    groupUniqArray(_shard_num) AS shards,
    ..
FROM table
WHERE ..
GROUP BY ..
SETTINGS distributed_group_by_no_merge = 0

/* Aggregated states were merged into ONE result set on initiator-node.
┌─shards────┬─ ..
│ [2, 1, 3] │  ..
└───────────┴─ ..
*/

distributed_group_by_no_merge=1

SELECT
    groupUniqArray(_shard_num) AS shards,
    ..
FROM table
WHERE ..
GROUP BY ..
SETTINGS distributed_group_by_no_merge = 1

/* Get a set of final results (not aggregated states) from each shard. They should be unioned manually.
┌─shards─┬─ ..
│ [2]    │  ..
│ [1]    │  ..
│ [3]    │  ..
└────────┴─ ..
*/

See https://clickhouse.com/docs/en/operations/settings/settings#distributed_group_by_no_merge.

How to avoid merging high cardinality sub-select aggregations on distributed tables

edited Dec 6, 2024 at 21:34

answered May 12, 2020 at 14:23

vladimir

15.4k3 gold badges51 silver badges82 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Lv Ke Over a year ago

1.when not using distributed_group_by_no_merge, subquery only internal part of a query from distributed_table will be executed on shads, all other parts outside ( ) will be executed at an initiator node. why not array join execute on mergetree node 2.when using distributed_group_by_no_merge=1 at the same level of subquery, the-same-level subquery will be executed on shards, and the outside aggregate query will be executed on distributed node. right?

vladimir Over a year ago

1) I updated my answer, please look at it. 2) yes, 'outside aggregates' will be run on initial-node independently from value of distributed_group_by_no_merge-param. The key difference is where be calculated the final result of distributed query - on initiator-node or each shard separately (understanding the intermediate aggregation state and merging of intermediate aggregation states help to clarify this topic).

Collectives™ on Stack Overflow

what is the effect of distributed_group_by_no_merge

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related