Mysql: Store array of data in a single column

Question

and thanks in advance for your help.

Well, this is my situation. I have a web system that makes some noise-related calculations based on a sample, created by a sonometer. Originally, the database only stored the results of these calculations. But now, I have been asked to also store the samplings themselves. Each sample is only a list of 300 or 600 numbers with 1 decimal each.

So, the simplest approach I have come up with is to add a column in the table that stores all the calculations for a given sample. This column should contain the list of numbers.

My question then: What is the best way to store this list of numbers in a single column?

Things to consider:

it would be nice if the list could be read by both PHP and javascript with no further complications.
The list is only useful if retrieved in its totality, that is why I'd rather not normalyze it. also, the calculations made on that list are kind of complex and already coded in PHP and javascript, so I won't be doing any SQL queries on elements of a given list

Also, if there are better approaches than storing it, I would love to know about them

Thanks a lot and have a good day/evening :)

So, you just want to store a big list of numbers in a database? Am I understanding you correctly? — Paul Dessert
– Paul Dessert, Commented Sep 2, 2013 at 7:41
300 numbers, one decimal each as in one digit each? If so a string would be simplest. — Tony Hopkinson
– Tony Hopkinson, Commented Sep 2, 2013 at 7:45
ah then the two answers given, which start with don't do that are where you want to be at then. Multi-value columns are just a bad idea when it comes to relational databases. — Tony Hopkinson
– Tony Hopkinson, Commented Sep 2, 2013 at 11:48

GordonM · Accepted Answer · 2013-09-02 08:32:42Z

8

First off, you really don't want to do that. A column in a RDBMS is meant to be atomic, in that it contains one and only one piece of information. Trying to store more than one piece of data in a column is a violation of first normal form.

If you absolutely must do it, then you need to convert the data into a form that can be stored as a single item of data, typically a string. You could use PHP's serialize() mechanism, XML parsing (if the data happens to be a document tree), json_encode(), etc.

But how do you query such data effectively? The answer is you can't.

Also, if someone else takes over your project at a later date you're really going to annoy them, because serialized data in a database is horrid to work with. I know because I've inherited such projects.

Did I mention you really don't want to do that? You need to rethink your design so that it can more easily be stored in terms of atomic rows. Use another table for this data, for example, and use foreign keys to relate it to the master record. They're called relational databases for a reason.

UPDATE: I've been asked about data storage requirements, as in whether a single row would be cheaper in terms of storage. The answer is, in typical cases no it's not, and in cases where the answer is yes the price you pay for it isn't worth paying.

If you use a 2 column dependant table (1 column for the foreign key of the record the sample belongs to, one for a single sample) then each column will require at worst require 16 bytes (8 bytes for a longint key column, 8 bytes for a double precision floating point number). For 100 records that's 1600 bytes (ignoring db overhead).

For a serialized string, you store in the best case 1 byte per character in the string. You can't know how long the string is going to be, but if we assume 100 samples with all the stored data by some contrived coincidence all falling between 10000.00 and 99999.99 with there only ever being 2 digits after the decimal point, then you're looking at 8 bytes per sample. In this case, all you've saved is the overhead of the foreign keys, so the amount of storage required comes out at 800 bytes.

That of course is based on a lot of assumptions, such as the character encoding always being 1 byte per character, the strings that make up the samples never being longer than 8 characters, etc.

But of course there's also the overhead of whatever mechanism you use to serialize the data. The absolute simplest method, CSV, means adding a comma between every sample. That adds n-1 bytes to the stored string. So the above example would now be 899 bytes, and that's with the simplest encoding scheme. JSON, XML, even PHP serializations all add more overhead characters than this, and you'll soon have strings that are a lot longer than 1600 bytes. And all this is with the assumption of 1 byte character encoding.

If you need to index the samples, the data requirements will grow even more disproportionately against strings, because a string index is a lot more expensive in terms of storage than a floating point column index would be.

And of course if your samples start adding more digits, the data storage goes up further. 39281.3392810 will not be storable in 8 bytes as a string, even in the best case.

And if the data is serialized the database can't manipulate. You can't sort the samples, do any kind of mathematical operations on them, the database doesn't even know they're numbers!

To be honest though, storage is ridiculously cheap these days, you can buy multiple TB drives for tiny sums. Is storage really that critical? Unless you have hundreds of millions of records then I doubt it is.

You might want to check out a book called SQL Antipatterns

edited Sep 2, 2013 at 8:32

answered Sep 2, 2013 at 7:55

GordonM

31.9k17 gold badges94 silver badges134 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Daren Over a year ago

question for you: isn't there an overhead associated with storing for each number with each record id and each sample id? or does the db engine store it as an array? if not then maybe storing it serialized in a single cell, or if all samples have same size in a single row might be better, no? that data might be stored only for record keeping and is not usually retrieved unless an audit is necessary.

ılǝ Over a year ago

@GordonM - a superb explanation. I once thought such data structure would be a good choice for a dynamic values table with several data types (text, decimal, multiple and single choices, choices plus text). Oh the horror...

GordonM Over a year ago

@daren See my upcoming update for an answer. As for "I'm only storing it for record keeping", you don't know that that requirement is going to remain set in stone. What if you do need the data later?

Daren Over a year ago

@GordonM thnx a lot! and even though storage is very cheap (too true), processing is rarely the bottleneck anyways, so the overhead for serializing/deserializing is cheap. And for numbers, 1 byte per char works very well, go ASCII! xD Anyways, I just wanted to pick your mind on that because it is a problem I have encountered and left me puzzled, I ended up storing it in a single row but with as many columns as values in the arrays because I always had the same number of values. But i was very close to store it serialized because an extra table seemed to much of an overhead.

GordonM Over a year ago

Extra tables really aren't much of an overhead, they're the whole reason for relational databases after all. The one column per datapoint is definitely a more workable solution than serializing though.

|

ılǝ · Accepted Answer · 2013-09-02 12:08:27Z

2

I would recommend creating a separate table with three columns for the samples. One would be the id of the record,second - the id of the sample and the third - the value. Of course if your main table doesn't have a unique id column already, you would have to create it and use it as foreign key.

The reason for my suggestion is simplicity and data integrity. Another argument is that this structure is memory efficient, as you will avoid varchar (which would also then also require parsing and has the offset of additional computations).

UPDATE As GordonM and Darin elaborated below, the memory argument is not necessarily valid (see below for further explanation), but there are also other reasons against a serialized approach.

Finally, this doesn't involve any complex php - java script and is quite straight forward to code.

edited Sep 2, 2013 at 12:08

answered Sep 2, 2013 at 7:45

ılǝ

3,5282 gold badges35 silver badges47 bronze badges

4 Comments

Daren Over a year ago

question for you: isn't there an overhead associated with storing for each number with each record id and each sample id? or does the db engine store it as an array? if not then maybe storing it serialized in a single cell, or if all samples have same size in a single row might be better, no?

ılǝ Over a year ago

Do you mean in terms of memory? I am not a big expert in serialization, but as far as I understand, that translates into storing data as a chars. If so, then in Variant A 300 rows x 3 columns of FLOAT = 3600 bytes. Variant B 300 numbers, say within 8 digits (10 chars including point and '+'/'-' sign) + some chars for ids -> let's say make up for a char(3000), which is 9000 bytes in UTF8. My maths and logic are likely over the top, so I am likely wrong, but at first glimpse it appears the scales are in favor of variant A. Not to speak of the extra effort on serialization.

Daren Over a year ago

If you are storing numbers, then utf-8 is a waste, ASCII is more than enough, so that slashes your approximation by a third, which makes it competitive with Variant A, but GordonM below has kindly expanded on this so you got me convinced at 95% already, no need to insist. :)

ılǝ Over a year ago

Agreed. I was just reading upon more efficient options for storing numbers in chars, ASCII is indeed a good candidate and the result - comparatively good in terms of memory size. Thanks for the follow up!

Collectives™ on Stack Overflow

Mysql: Store array of data in a single column

2 Answers 2

7 Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related