0

Our team is developing an application that takes statistics of the population we regulate, and applies a set of parameters to those statistics via an algorithm to produce measurements for each member of the population. The user will be able to set the values of each parameter, run the algorithm, see the results, and redo it with different values, etc.

The parameters are fixed (say 12 parameters), but are of different numeric types and/or precision.

For example, some parameters are very large integers, some are small integers, some are monetary, some are high precision decimals.

The first cut of the data model has the parameters defined explictly in a couple of tables each with the data type and precision that suits that parameter, eg (simplified):

STABLE_PARAMETERS
id                     number(4)
very_large_parameter_1 number(18)
very_large_parameter_2 number(15)
range_parameter_min    number(3)
range_parameter_max    number(9)

VOLATILE_PARAMETERS
id                     number(4)
monetary_min           number(5,2)
monetary_max           number(11,2)
conversion_rate        number(7,6)
count_rate             number(5,4)

The algorithm is being defined in code in the application, tested and is expected to be stable for at least a couple of years. If the algorithm changes, it will be a code change, retest and release. If a new parameter is added as part of this, it would need to be added to the relevant table and the code changed to accommodate it.

I had initially thought that a more generic approach to the parameters would be better, with the parameters defined as rows in a generic table instead of explicitly defined as individual columns, eg:

PARAMETERS
id              number(4)
parameter_name  varchar2(100)
parameter_type  varchar2(20) -- eg INTEGER, MONEY, FLOAT?
parameter_value ???

where parameter_value could be:

(1) string representation of the number

parameter_value   varchar2(20) -- "0.000032" or "1000000000" or "1500.00"

or

(2) all-encompassing number definition

parameter_value   number(24,6) -- 0.000032 or 1000000000.000000 or 1500.000000

or

(3) 3 columns, 1 for each parameter_type, each parameter row has only 1 of these columns populated, the other 2 are null.

parameter_value_int   number(18)   -- 1000000000
parameter_value_money number(11,2) -- 1500.00
parameter_value_float number(7,6)  -- 0.000032

None of these seem to be the right way to go. (1) is storing numbers as strings which is risky and needs interpretation of the value based on the parameter_type. (2) is overkill for many of the numbers. (3) is a bit better but needs interpretation based on the parameter_type.

In future who knows, maybe date parameters or character parameters will be added too?

Given this scenario, what would be the best way to model this?

1
  • There's no such thing as "better"/"best" in engineering unless you define it. Also unfortunately all reasonable practical definitions require a ridiculous amount of experience with a ridiculous number of factors that interact with chaotic sensitivity to details. Make straightforward designs. When you demonstrate that a design and all alternatives you can think of have problems (whatever that means at the time), then ask a very specific question. Which should also define "better"/"best". Strategy for “Which is better” questions Commented Sep 25, 2021 at 20:23

1 Answer 1

0

For example, some parameters are very large integers, some are small integers, some are monetary, some are high precision decimals.

More to the point, the very large integers will be joined/compared to very large integer columns in the data model, small integers to small integers, monetary to monetary, etc.

So if you held those parameters in some sort of generic format, you'd then need a generic way to compare a generic parameter to a generic (unknown) column in a generic (unknown) table amongst your population statistics.

SQL does not do generics. You'd need some ghastly logic (in a high-level language) to figure out which generic parameter is being compared to which table.column, and generate the SQL as strings. There's a huge potential for mis-matches, needing a dramatic amount of testing.

If a new parameter is added as part of this, it would need to be added to the relevant table and the code changed to accommodate it.

Yes, I'd expect that if a new parameter is added, it's there to join/compare to a specific new column in the data model. Then name it for that column; don't make it generic.

If you have one monetary column in the statistics for household income, and another for property value, put two monetary columns in the parameters, named household_income, property_value. Who knows, the numeric precision of those might change in future.

I have seen something like what you're asking for in enterprise-wide packages, for user-defined attributes. There's no way the package vendor can know in advance what extra info the user wants to record. But there's no generic way to query this: the vendor's approach was to feed the data out to spreadsheet, and make it the users' problem.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.