2

I'm saving dynamic objects (objects of which I do not know the type upfront) using the following 2 tables in Postgres:

CREATE TABLE IF NOT EXISTS objects(
    id UUID NOT NULL DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL,

    name TEXT NOT NULL,

    PRIMARY KEY(id)
);

CREATE TABLE IF NOT EXISTS object_values(
    id UUID NOT NULL DEFAULT gen_random_uuid(),
    event_id UUID NOT NULL,

    param TEXT NOT NULL,
    value TEXT NOT NULL,
);

So for instance, if I have the following objects:

dog = [
  { breed: "poodle", age: 15, ...},
  { breed: "husky", age: 9, ...},
}
monitors = [
  { manufacturer: "dell", ...},
}

It will live in the DB as follows:

-- objects
| id | user_id | name    |
|----|---------|---------|
| 1  | 1       | dog     |
| 2  | 2       | dog     |
| 3  | 1       | monitor |

-- object_values
| id | event_id | param        | value  |
|----|----------|--------------|--------|
| 1  | 1        | breed        | poodle |
| 2  | 1        | age          | 15     |
| 3  | 2        | breed        | husky  |
| 4  | 2        | age          | 9      |
| 5  | 3        | manufacturer | dell   |

Note, these tables are big (hundreds of millions). Generally optimised for writing. What would be a good way of querying/filtering objects based on multiple object params? For instance: Select the number of all husky dogs above the age of 10 per unique user.

I also wonder whether it would have been better to denormalise the tables and collapse the params onto a JSON column (and use gin indexes).

Are there any standards I can use here?

6
  • "What would be a good way of querying/filtering objects based on multiple object params?" - use separate tables that are optimised for reading. Do not fear to create tables dynamically. Commented Jan 22, 2023 at 21:13
  • Hey @Bergi do you mind expanding a bit on your answer. Do you mean different read/write db replicas? For now, unfortunately, that's not an option. Also, there's no cap on how many types of objects I'll get. How would I make generic db queries if I start making tables dynamically? Commented Jan 22, 2023 at 21:18
  • 1
    I only meant different tables in a denormalised schema (which you kinda have already anyway, using EAV, so let's introduce some duplication :P), not database replicas. And yes, an unlimited number of dynamic tables will be hard to manage, but it would allow you to put indices on them where you care about performance and it would allow you to write standard relational queries (SELECT user_id, count(*) FROM husky WHERE age > 10 GROUP BY user_id). Commented Jan 22, 2023 at 21:34
  • Ahh, I understand @Bergi! Yes, having different tables is still an option we're considering, unfortunately, not a change we can do now. Would you say JSONB (along with 1 gin index) would be a better option than using this EAV approach? Commented Jan 22, 2023 at 21:44
  • Unless you have a requirement for writing multiple attributes for the same object independently (at different times) and optimise for writing speed, JSONB probably makes it more efficient to retrieve the whole document and to match multiple conditions, than using EAV. But more importantly, JSONB also makes writing queries simpler :-) (Disclaimer: performance needs to measured, I'm only guessing). Commented Jan 22, 2023 at 21:49

1 Answer 1

1

"Select the number of all husky dogs above the age of 10 per unique user" - The following query would do it.

SELECT user_id, COUNT(DISTINCT event_id) AS num_husky_dogs_older_than_10
FROM       objects       o
INNER JOIN object_values ov
        ON o.id_ = ov.event_id
       AND o.name_ = 'dog'
GROUP BY o.user_id
HAVING MAX(CASE WHEN ov.param = 'age' 
                 AND ov.value_::integer >= 10 THEN 1 END) = 1
   AND MAX(CASE WHEN ov.param = 'breed'
                 AND ov.value_ = 'husky'      THEN 1 END) = 1;

Since your queries are most likely affected by having always the same JOIN operation between these two tables on the same fields, would be good to have a indices on:

  • the fields you join on ("objects.id", "object_values.event_id")
  • the fields you filter on ("objects.name", "object_values.param", "object_values.value_")

Check the demo here.

Sign up to request clarification or add additional context in comments.

9 Comments

Thanks! Unfortunately, i think this will include all dogs, not just huskeys
Right, I must have missed the "husky" condition, check the updated query. I've included a demo in PostgreSQL too.
Hmm, I thought the fork link that I gave in my above comment had 1 husky. In any case, it's returning 2 instead of the expected 1. It seems it counts the number of parameters of the matched object, instead of the sum of huskies
You'll get as many rows as the amount of existing rows per "event_id". This is due to the CASE expressions that will generate 1 when the corresponding condition is true, NULL otherwise (see here). Then we compute the MAX of these fields, by partitioning on user_id, and if we don't get 1 on both these fields, the value is not returned. If you had three properties for that event_id, we would have had 3 rows instead of 2.
Note that there's a further typo in the query: COUNT(DISTINCT user_id) will count if a user has any dog with those conditions, while COUNT(DISTINCT event_id) will count how many dogs per user. Reading again your description, it looks like you're in need of the second one. I have updated the answer and the demo accordingly.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.