0

I've got a table in Postgres that I have optimized the column order have as little padding as possible after having read Calculating and saving space in PostgreSQL.

But then I noticed that I could add some extra value by changing the datatype of a few of the columns. But how would that affect the data packing (and padding), and would the order of the columns change? Cause the altered columns would take up more space than the unaltered columns so something had to change but how?

So I did an experiment where I took this table below and altered the boolean columns to smallint with the intent for it so show how windy or hot it is rather than just saying that it is windy.

create table t2(
    ts timestamptz NOT NULL,
    is_hot boolean NOT NULL,
    is_windy boolean NOT NULL,
    humidity smallint NOT NULL,
    wind_direction real NOT NULL
);
-- Fill with data
alter table t2
    alter column is_hot type smallint using CASE WHEN is_hot THEN 1 ELSE 0 END;

Here's a DBfiddle

Turned out that the column order stayed the same but I was left with some padding due to wind_direction no longer being aligned to 4 bytes.

What is it that postgres does to alter all these rows as all rows needs to be updated now that the row takes up more physical space?

From Dropping column in Postgres on a large dataset I can gather that dropped columns become hidden NULLable columns until the row is updated the next time at which time the column physically is removed.

How is ALTER column type different from me creating a new table with the altered definition and inserting all the old (but transformed) data into the new table myself with SQL?

Is there some optimizations that Postgres can do because it's not me explicitly making a new table and moving the data over?

1 Answer 1

0

If you alter the table like that, the whole table gets rewritten.

Before the change, a row would look like this:

                      timestamp  smallint
                           |        |
                           v        v
hhhhhhhhhhhhhhhhhhhhh_|dddddddd|d|d|dd|dddd
                                ^ ^      ^
                                 V       |
                             booleans   real

 h ... header byte (there are 23)
 _ ... padding byte
 d ... data byte
 | ... optical separator (0 bytes)

After the change:

                      timestamp  smallints
                           |       /   |
                           v      /    v
hhhhhhhhhhhhhhhhhhhhh_|dddddddd|dd|d_|dd__|dddd
                                   ^        ^
                                   |        |
                                boolean    real

That's because a smallint must start at an address that is a multiple of two, and a real at an address that is a multiple of four.

So you are ending up with three extra padding bytes.

Sign up to request clarification or add additional context in comments.

4 Comments

The byte alignment is explained in the question I linked stackoverflow.com/questions/2966524/… But the rewrite answers one of the questions, would you be able to expand on the two other questions I had?
You should restrict yourself to one question per question. And if that was not your main question, you should have chosen a different title. A rewrite is about the same as creating a new table and copying the rows yourself, except that you don't have to take care of all the foreign keys etc., because PostgreSQL does it automatically.
While I understand the reasoning behind a single question per question I found that these questions are so closely related to each other that posing these three questions separately would make them considered duplicates.
Then that might be an indication that you need consulting rather than a single, focused Stackoverflow answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.