@@ -275,44 +275,41 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
275275
276276 <para>
277277 The driving table may be joined to one or more other tables using nested
278- loops or hash joins. The outer side of the join may be any kind of
278+ loops or hash joins. The inner side of the join may be any kind of
279279 non-parallel plan that is otherwise supported by the planner provided that
280280 it is safe to run within a parallel worker. For example, it may be an
281- index scan which looks up a value based on a column taken from the inner
282- table. Each worker will execute the outer side of the plan in full, which
283- is why merge joins are not supported here. The outer side of a merge join
284- will often involve sorting the entire inner table; even if it involves an
285- index, it is unlikely to be productive to have multiple processes each
286- conduct a full index scan of the inner table.
281+ index scan which looks up a value taken from the outer side of the join.
282+ Each worker will execute the inner side of the join in full, which for
283+ hash join means that an identical hash table is built in each worker
284+ process.
287285 </para>
288286 </sect2>
289287
290288 <sect2 id="parallel-aggregation">
291289 <title>Parallel Aggregation</title>
292290 <para>
293- It is not possible to perform the aggregation portion of a query entirely
294- in parallel. For example, if a query involves selecting
295- <literal>COUNT(*)</>, each worker could compute a total, but those totals
296- would need to combined in order to produce a final answer. If the query
297- involved a <literal>GROUP BY</> clause, a separate total would need to
298- be computed for each group. Even though aggregation can't be done entirely
299- in parallel, queries involving aggregation are often excellent candidates
300- for parallel query, because they typically read many rows but return only
301- a few rows to the client. Queries that return many rows to the client
302- are often limited by the speed at which the client can read the data,
303- in which case parallel query cannot help very much.
304- </para>
305-
306- <para>
307- <productname>PostgreSQL</> supports parallel aggregation by aggregating
308- twice. First, each process participating in the parallel portion of the
309- query performs an aggregation step, producing a partial result for each
310- group of which that process is aware. This is reflected in the plan as
311- a <literal>PartialAggregate</> node. Second, the partial results are
291+ <productname>PostgreSQL</> supports parallel aggregation by aggregating in
292+ two stages. First, each process participating in the parallel portion of
293+ the query performs an aggregation step, producing a partial result for
294+ each group of which that process is aware. This is reflected in the plan
295+ as a <literal>Partial Aggregate</> node. Second, the partial results are
312296 transferred to the leader via the <literal>Gather</> node. Finally, the
313297 leader re-aggregates the results across all workers in order to produce
314298 the final result. This is reflected in the plan as a
315- <literal>FinalizeAggregate</> node.
299+ <literal>Finalize Aggregate</> node.
300+ </para>
301+
302+ <para>
303+ Because the <literal>Finalize Aggregate</> node runs on the leader
304+ process, queries which produce a relatively large number of groups in
305+ comparison to the number of input rows will appear less favorable to the
306+ query planner. For example, in the worst-case scenario the number of
307+ groups seen by the <literal>Finalize Aggregate</> node could be as many as
308+ the number of input rows which were seen by all worker processes in the
309+ <literal>Partial Aggregate</> stage. For such cases, there is clearly
310+ going to be no performance benefit to using parallel aggregation. The
311+ query planner takes this into account during the planning process and is
312+ unlikely to choose parallel aggregate in this scenario.
316313 </para>
317314
318315 <para>
@@ -321,10 +318,11 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
321318 have a combine function. If the aggregate has a transition state of type
322319 <literal>internal</>, it must have serialization and deserialization
323320 functions. See <xref linkend="sql-createaggregate"> for more details.
324- Parallel aggregation is not supported for ordered set aggregates or when
325- the query involves <literal>GROUPING SETS</>. It can only be used when
326- all joins involved in the query are also part of the parallel portion
327- of the plan.
321+ Parallel aggregation is not supported if any aggregate function call
322+ contains <literal>DISTINCT</> or <literal>ORDER BY</> clause and is also
323+ not supported for ordered set aggregates or when the query involves
324+ <literal>GROUPING SETS</>. It can only be used when all joins involved in
325+ the query are also part of the parallel portion of the plan.
328326 </para>
329327
330328 </sect2>
0 commit comments