I have an ever-growing table called transactions which grows ~10 million rows per month.
This table has a jsonb column called extra.
70% of extra column of the transactions records are NULL and the rest of them has a json value like so:
{
"lang": "en",
"pages": 3,
"message": "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Egestas purus viverra accumsan in nisl nisi. Arcu cursus vitae congue mauris rhoncus aenean vel elit scelerisque. In egestas erat imperdiet sed euismod nisi porta lorem mollis. Morbi tristique senectus et netus. Mattis pellentesque id nibh tortor id aliquet lectus proin. Sapien faucibus et molestie ac feugiat sed lectus vestibulum..."
}
Notice: All extra json keys are fixed for all rows and will not change.
An overview of transactions table :
id | price | type | extra
-------------------------------------------
1 | 2000.00 | SMS | null
2 | 2000.00 | SMS | null
3 | 4000.00 | SMS | null
4 | 5000.00 | SMS | {"lang": "en", "pages":8, "message":"Lore..."}
5 | 4000.00 | SMS | null
6 | 4000.00 | SMS | null
7 | 5000.00 | SMS | {"lang": "de", "pages":5, "message":"Some..."}
Why I did that?
I'm using jsonb column instead of three separate columns to avoid many NULL values.
With jsonb I only have 30% NULLs only on 1 column, But when I use 3 separate columns instead of 1 jsonb column I have 30% NULL for each column.
The Question:
Is it good idea to split the my extra column into 3 separate columns?
Like so:
id | price | type | lang | pages | message
--------------------------------------------
1 | 2000.00 | SMS | null | null | null
2 | 2000.00 | SMS | null | null | null
3 | 4000.00 | SMS | null | null | null
4 | 5000.00 | SMS | en | 8 | Lorem...
5 | 4000.00 | SMS | null | null | null
6 | 4000.00 | SMS | null | null | null
7 | 5000.00 | SMS | de | 5 | Some...
Or instead, I can add an extra table(for example transaction_info) with One-To-One relationship. Like so:
transcations
id | price | type
-------------------
1 | 2000.00 | SMS
2 | 2000.00 | SMS
3 | 4000.00 | SMS
4 | 5000.00 | SMS
5 | 4000.00 | SMS
6 | 4000.00 | SMS
7 | 5000.00 | SMS
transaction_info
id | transaction_id | lang | pages | message
--------------------------------------------
1 | 4 | en | 8 | Lorem...
2 | 7 | de | 5 | Some...
With this approach I don't have any NULL values on both tables.
Which one do you prefer?