Commit e652273
committed
Redesign handling of SIGTERM/control-C in parallel pg_dump/pg_restore.
Formerly, Unix builds of pg_dump/pg_restore would trap SIGINT and similar
signals and set a flag that was tested in various data-transfer loops.
This was prone to errors of omission (cf commit 3c8aa66); and even if
the client-side response was prompt, we did nothing that would cause
long-running SQL commands (e.g. CREATE INDEX) to terminate early.
Also, the master process would effectively do nothing at all upon receipt
of SIGINT; the only reason it seemed to work was that in typical scenarios
the signal would also be delivered to the child processes. We should
support termination when a signal is delivered only to the master process,
though.
Windows builds had no console interrupt handler, so they would just fall
over immediately at control-C, again leaving long-running SQL commands to
finish unmolested.
To fix, remove the flag-checking approach altogether. Instead, allow the
Unix signal handler to send a cancel request directly and then exit(1).
In the master process, also have it forward the signal to the children.
On Windows, add a console interrupt handler that behaves approximately
the same. The main difference is that a single execution of the Windows
handler can send all the cancel requests since all the info is available
in one process, whereas on Unix each process sends a cancel only for its
own database connection.
In passing, fix an old problem that DisconnectDatabase tends to send a
cancel request before exiting a parallel worker, even if nothing went
wrong. This is at least a waste of cycles, and could lead to unexpected
log messages, or maybe even data loss if it happened in pg_restore (though
in the current code the problem seems to affect only pg_dump). The cause
was that after a COPY step, pg_dump was leaving libpq in PGASYNC_BUSY
state, causing PQtransactionStatus() to report PQTRANS_ACTIVE. That's
normally harmless because the next PQexec() will silently clear the
PGASYNC_BUSY state; but in a parallel worker we might exit without any
additional SQL commands after a COPY step. So add an extra PQgetResult()
call after a COPY to allow libpq to return to PGASYNC_IDLE state.
This is a bug fix, IMO, so back-patch to 9.3 where parallel dump/restore
were introduced.
Thanks to Kyotaro Horiguchi for Windows testing and code suggestions.
Original-Patch: <7005.1464657274@sss.pgh.pa.us>
Discussion: <20160602.174941.256342236.horiguchi.kyotaro@lab.ntt.co.jp>1 parent 7392eed commit e652273
File tree
8 files changed
+505
-169
lines changed- src/bin/pg_dump
8 files changed
+505
-169
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
57 | | - | |
58 | 57 | | |
59 | 58 | | |
60 | 59 | | |
| |||
184 | 183 | | |
185 | 184 | | |
186 | 185 | | |
187 | | - | |
188 | | - | |
189 | | - | |
190 | 186 | | |
191 | 187 | | |
192 | 188 | | |
| |||
351 | 347 | | |
352 | 348 | | |
353 | 349 | | |
354 | | - | |
355 | | - | |
356 | | - | |
357 | 350 | | |
358 | 351 | | |
359 | 352 | | |
| |||
414 | 407 | | |
415 | 408 | | |
416 | 409 | | |
417 | | - | |
418 | | - | |
419 | | - | |
420 | 410 | | |
421 | 411 | | |
422 | 412 | | |
| |||
0 commit comments