Which is better in a database: one large table with columns that are often NULL, or many different tables?

Ask Question

Asked 1 year, 2 months ago

Modified 1 year, 2 months ago

Viewed 78 times

Context: I work for a company that stores and analyzes data from inertial measuring units (IMU). My teammates and I are considering creating and maintaining a DB for all the data we gather for analysis purposes (which is currently being saved in .csv files).

Each device we gather data from has a slightly different output. It's safe to assume most devices will output the following values: Gx, Gy, Gz, Ax, Ay, Az, Temperature

But some units output 1 or more additional types of data. For example: Gx, Gy, Gz, Ax, Ay, Az, High-G Ax, Temperature

To move away from the .csv files, we'll need to have a table (or multiple tables) to store the measured data. None of us have any experience in creating or maintaining a DB, so we're unsure of what would be the best way to implement it.

One approach could be to create a table for all unit types that includes all common measurement axes and an "Other" column for cases where the unit has a measurement axis that is not specified. The columns would look something like: RunID, Gx, Gy, Gz, Ax, Ay, Az, HighG_Ax, Temperature, Other

In this case I expect the High-G Ax column to often be filled with NULL values, since many of the units used for measurements don't utilize this sensor type. Same goes for the "Other" column.

The biggest issue I can see with this approach is that if we'll come across a unit type that's ENTIRELY different, i.e. has 3+ measurement axes that weren't pre-included in this table, we'd have to retroactively add columns to the table and fill them with NULL for all previous entries.

The other option is to create a different table for each unit type with columns that match the output data. This ensures no NULL values at all, but means we have at the very least 6-7 different tables for measurements where most of the columns are the same.

So I'd have one table that has RunId, Gx, Gy, Gz, Ax, Ay, Az, HighG_Ax, Temperature and at least one other table that has RunID, Gx, Gy, Gz, Ax, Ay, Az, Temperature But I'll also be able to handle many different types of units with different sensors and measurement axes without having to go back and alter tables retroactively.

These tables could get very large very quickly considering the amount of data we gather, so I'm trying to avoid mistakes that would require a complete redesign in the near future.

Which solution is best-practice? What are the possible downsides to consider?

Thank you in advance :)

asked Sep 9, 2024 at 8:05

PileOfBirds

111 bronze badge

Are data comparable across types? If not then prefer table per type. If you have data that repeats across types you could consider using a shared table and join tables with the additional data. Focus on your main use case then try a few designs. It will give you a lot more insight and confidence than any answer we can provide. SQLite or DuckDB might be good ways to experiment even if you want to use PostgreSQL.

user9706
– user9706

2024-09-09 08:38:41 +00:00
Commented Sep 9, 2024 at 8:38
A simple solution is that I would go with a table COMMON_MEASUREMENTS that has the common columns that are usually completed (RunID, Gx, Gy, Gz, Ax, Ay, Az, Temperature), and further more I can make some columns Nullable except the RunID column. After that I would have a table SPECIAL_MEASUREMENTS with the columns ,ID of type serial, RunID, NAME and VALUE. WHERE Name is the name of the measurement. Also I would index the column NAME for faster queries.

Leonard
– Leonard

2024-09-09 08:58:59 +00:00
Commented Sep 9, 2024 at 8:58
Accomodate only the base measurements in regular columns and add a single extra column of type jsonb. It's more compact than entity-attribute-value and entirely flexible: the structure in there can vary, so you can add or skip measurement types all you want, without having to alter and backfill the table. If you don't want to have it hold null most of the time, you can move it to a 1:1 linked table that just holds an fk to primary, plus this column. Here's a slightly related thread comparing space consumption of jsonb and aEAV.

Zegarek
– Zegarek

2024-09-09 10:51:18 +00:00
Commented Sep 9, 2024 at 10:51
It kind of feels like OP from the other thread could be tackling the same problem by creating 20k columns ahead of time hoping that's more than enough to hold all possible/anticipated sensor data sources, then just saving null for all not-yet-connected. All solutions suggested there also handle the case here, all of them allowing for sensors to be added dynamically, even beyond the 20k.

Zegarek
– Zegarek

2024-09-09 10:58:29 +00:00
Commented Sep 9, 2024 at 10:58
I think that your main concern is not saving disk space, but to keep the model flexible to accommodate futures unforeseen measurements. A @Leonard states, I would go with having a common table with the main measurements, and separate ones for the extra ones. Now anothe key aspect is how (and how often) you want to query this data. You haven't included that part in the question, and this can heavily affect these recommendations.

The Impaler
– The Impaler

2024-09-09 13:46:33 +00:00
Commented Sep 9, 2024 at 13:46

| Show 2 more comments

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Which is better in a database: one large table with columns that are often NULL, or many different tables?

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked