@@ -2837,3 +2837,69 @@ So, is this worth doing? Would a robust implementation likely be accepted for
28372837---------------------------(end of broadcast)---------------------------
28382838TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
28392839
2840+ On Sun, 2005-04-10 at 21:12 -0400, Bruce Momjian wrote:
2841+ > Jim C. Nasby wrote:
2842+ > > Maybe better for -hackers, but here it goes anyway...
2843+ > >
2844+ > > Has anyone looked at compressing WAL's before writing to disk? On a
2845+ > > system generating a lot of WAL it seems there might be some gains to be
2846+ > > had WAL data could be compressed before going to disk, since today's
2847+ > > machines are generally more I/O bound than CPU bound. And unlike the
2848+ > > base tables, you generally don't need to read the WAL, so you don't
2849+ > > really need to worry about not being able to quickly scan through the
2850+ > > data without decompressing it.
2851+ >
2852+ > I have never heard anyone talk about it, but it seems useful. I think
2853+ > compressing the page images written on first page modification since
2854+ > checkpoint would be a big win.
2855+
2856+ Well it was discussed 2-3 years ago as part of the PITR preamble. You
2857+ may be surprised to read that over...
2858+
2859+ A summary of thoughts to date on this are:
2860+
2861+ xlog.c XLogInsert places backup blocks into the wal buffers before
2862+ insertion, so is the right place to do this. It would be possible to do
2863+ this before any LWlocks are taken, so would not not necessarily impair
2864+ scalability.
2865+
2866+ Currently XLogInsert is a severe CPU bottleneck around the CRC
2867+ calculation, as identified recently by Tom. Digging further, the code
2868+ used seems to cause processor stalls on Intel CPUs, possibly responsible
2869+ for much of the CPU time. Discussions to move to a 32-bit CRC would also
2870+ be effected by this because of the byte-by-byte nature of the algorithm,
2871+ whatever the length of the generating polynomial. PostgreSQL's CRC
2872+ algorithm is the fastest BSD code available. Until improvement is made
2873+ there, I would not investigate compression further. Some input from
2874+ hardware tuning specialists is required...
2875+
2876+ The current LZW compression code uses a 4096 byte lookback size, so that
2877+ would need to be modified to extend across a whole block. An
2878+ alternative, suggested originally by Tom and rediscovered by me because
2879+ I just don't read everybody's fine words in history, is to simply take
2880+ out the freespace in the middle of every heap block that consists of
2881+ zeros.
2882+
2883+ Any solution in this area must take into account the variability of the
2884+ size of freespace in database blocks. Some databases have mostly full
2885+ blocks, others vary. There would also be considerable variation in
2886+ compressability of blocks, especially since some blocks (e.g. TOAST) are
2887+ likely to already be compressed. There'd need to be some testing done to
2888+ see exactly the point where the costs of compression produce realisable
2889+ benefits.
2890+
2891+ So any solution must be able to cope with both compressed blocks and
2892+ non-compressed blocks. My current thinking is that this could be
2893+ achieved by using the spare fourth bit of the BkpBlocks portion of the
2894+ XLog structure, so that either all included BkpBlocks are compressed or
2895+ none of them are, and hope that allows benefit to shine through. Not
2896+ thought about heap/index issues.
2897+
2898+ It is possible that an XLogWriter process could be used to assist in the
2899+ CRC and compression calculations also, an a similar process used to
2900+ assist decompression for recovery, in time.
2901+
2902+ I regret I do not currently have time to pursue further.
2903+
2904+ Best Regards, Simon Riggs
2905+
0 commit comments