5

and thanks in advance for your help.

Well, this is my situation. I have a web system that makes some noise-related calculations based on a sample, created by a sonometer. Originally, the database only stored the results of these calculations. But now, I have been asked to also store the samplings themselves. Each sample is only a list of 300 or 600 numbers with 1 decimal each.

So, the simplest approach I have come up with is to add a column in the table that stores all the calculations for a given sample. This column should contain the list of numbers.

My question then: What is the best way to store this list of numbers in a single column?

Things to consider:

  • it would be nice if the list could be read by both PHP and javascript with no further complications.
  • The list is only useful if retrieved in its totality, that is why I'd rather not normalyze it. also, the calculations made on that list are kind of complex and already coded in PHP and javascript, so I won't be doing any SQL queries on elements of a given list

Also, if there are better approaches than storing it, I would love to know about them

Thanks a lot and have a good day/evening :)

5
  • So, you just want to store a big list of numbers in a database? Am I understanding you correctly? Commented Sep 2, 2013 at 7:41
  • 1
    You could store it as JSON. Commented Sep 2, 2013 at 7:42
  • 300 numbers, one decimal each as in one digit each? If so a string would be simplest. Commented Sep 2, 2013 at 7:45
  • my English is not very good. One decimal each as in 87.4 Commented Sep 2, 2013 at 7:49
  • ah then the two answers given, which start with don't do that are where you want to be at then. Multi-value columns are just a bad idea when it comes to relational databases. Commented Sep 2, 2013 at 11:48

2 Answers 2

8

First off, you really don't want to do that. A column in a RDBMS is meant to be atomic, in that it contains one and only one piece of information. Trying to store more than one piece of data in a column is a violation of first normal form.

If you absolutely must do it, then you need to convert the data into a form that can be stored as a single item of data, typically a string. You could use PHP's serialize() mechanism, XML parsing (if the data happens to be a document tree), json_encode(), etc.

But how do you query such data effectively? The answer is you can't.

Also, if someone else takes over your project at a later date you're really going to annoy them, because serialized data in a database is horrid to work with. I know because I've inherited such projects.

Did I mention you really don't want to do that? You need to rethink your design so that it can more easily be stored in terms of atomic rows. Use another table for this data, for example, and use foreign keys to relate it to the master record. They're called relational databases for a reason.

UPDATE: I've been asked about data storage requirements, as in whether a single row would be cheaper in terms of storage. The answer is, in typical cases no it's not, and in cases where the answer is yes the price you pay for it isn't worth paying.

If you use a 2 column dependant table (1 column for the foreign key of the record the sample belongs to, one for a single sample) then each column will require at worst require 16 bytes (8 bytes for a longint key column, 8 bytes for a double precision floating point number). For 100 records that's 1600 bytes (ignoring db overhead).

For a serialized string, you store in the best case 1 byte per character in the string. You can't know how long the string is going to be, but if we assume 100 samples with all the stored data by some contrived coincidence all falling between 10000.00 and 99999.99 with there only ever being 2 digits after the decimal point, then you're looking at 8 bytes per sample. In this case, all you've saved is the overhead of the foreign keys, so the amount of storage required comes out at 800 bytes.

That of course is based on a lot of assumptions, such as the character encoding always being 1 byte per character, the strings that make up the samples never being longer than 8 characters, etc.

But of course there's also the overhead of whatever mechanism you use to serialize the data. The absolute simplest method, CSV, means adding a comma between every sample. That adds n-1 bytes to the stored string. So the above example would now be 899 bytes, and that's with the simplest encoding scheme. JSON, XML, even PHP serializations all add more overhead characters than this, and you'll soon have strings that are a lot longer than 1600 bytes. And all this is with the assumption of 1 byte character encoding.

If you need to index the samples, the data requirements will grow even more disproportionately against strings, because a string index is a lot more expensive in terms of storage than a floating point column index would be.

And of course if your samples start adding more digits, the data storage goes up further. 39281.3392810 will not be storable in 8 bytes as a string, even in the best case.

And if the data is serialized the database can't manipulate. You can't sort the samples, do any kind of mathematical operations on them, the database doesn't even know they're numbers!

To be honest though, storage is ridiculously cheap these days, you can buy multiple TB drives for tiny sums. Is storage really that critical? Unless you have hundreds of millions of records then I doubt it is.

You might want to check out a book called SQL Antipatterns

Sign up to request clarification or add additional context in comments.

7 Comments

question for you: isn't there an overhead associated with storing for each number with each record id and each sample id? or does the db engine store it as an array? if not then maybe storing it serialized in a single cell, or if all samples have same size in a single row might be better, no? that data might be stored only for record keeping and is not usually retrieved unless an audit is necessary.
@GordonM - a superb explanation. I once thought such data structure would be a good choice for a dynamic values table with several data types (text, decimal, multiple and single choices, choices plus text). Oh the horror...
@daren See my upcoming update for an answer. As for "I'm only storing it for record keeping", you don't know that that requirement is going to remain set in stone. What if you do need the data later?
@GordonM thnx a lot! and even though storage is very cheap (too true), processing is rarely the bottleneck anyways, so the overhead for serializing/deserializing is cheap. And for numbers, 1 byte per char works very well, go ASCII! xD Anyways, I just wanted to pick your mind on that because it is a problem I have encountered and left me puzzled, I ended up storing it in a single row but with as many columns as values in the arrays because I always had the same number of values. But i was very close to store it serialized because an extra table seemed to much of an overhead.
Extra tables really aren't much of an overhead, they're the whole reason for relational databases after all. The one column per datapoint is definitely a more workable solution than serializing though.
|
2

I would recommend creating a separate table with three columns for the samples. One would be the id of the record,second - the id of the sample and the third - the value. Of course if your main table doesn't have a unique id column already, you would have to create it and use it as foreign key.

The reason for my suggestion is simplicity and data integrity. Another argument is that this structure is memory efficient, as you will avoid varchar (which would also then also require parsing and has the offset of additional computations).

UPDATE As GordonM and Darin elaborated below, the memory argument is not necessarily valid (see below for further explanation), but there are also other reasons against a serialized approach.

Finally, this doesn't involve any complex php - java script and is quite straight forward to code.

4 Comments

question for you: isn't there an overhead associated with storing for each number with each record id and each sample id? or does the db engine store it as an array? if not then maybe storing it serialized in a single cell, or if all samples have same size in a single row might be better, no?
Do you mean in terms of memory? I am not a big expert in serialization, but as far as I understand, that translates into storing data as a chars. If so, then in Variant A 300 rows x 3 columns of FLOAT = 3600 bytes. Variant B 300 numbers, say within 8 digits (10 chars including point and '+'/'-' sign) + some chars for ids -> let's say make up for a char(3000), which is 9000 bytes in UTF8. My maths and logic are likely over the top, so I am likely wrong, but at first glimpse it appears the scales are in favor of variant A. Not to speak of the extra effort on serialization.
If you are storing numbers, then utf-8 is a waste, ASCII is more than enough, so that slashes your approximation by a third, which makes it competitive with Variant A, but GordonM below has kindly expanded on this so you got me convinced at 95% already, no need to insist. :)
Agreed. I was just reading upon more efficient options for storing numbers in chars, ASCII is indeed a good candidate and the result - comparatively good in terms of memory size. Thanks for the follow up!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.