Elasticsearch - how store data, nested or array?

Question

Help me please. How to make the best document scheme for such data? There are data on the number of products in each city:

product1, [city = city1, available = 0], [city = city2, available = 2], [city = city3, available = 1], ... ...
product100, [city = city1, available = 1], [city = city2, available = 1], [city = city3, available = 1], ...

How can this data be saved for each of the products if the products can be 1000 and cities can be 100 and that the city-available search work?

What queries , aggregations , and how do you plan to display them in your app will determine the best way to store it.Can you specify these? — aclowkay
– aclowkay, Commented Sep 7, 2017 at 6:25

Krrish Raj · Accepted Answer · 2017-09-07 07:10:17Z

4

It completely depends on the way you want to query the data. When we store data as an array of objects, we lose correlation.
So if you store your data like-

prodId : id,
availability: [
    { city: city1, available: true},
    { city: city2, available: false}
   ]

ES will internally flatten the objects while indexing and it will be indexed as -

availability.city= [city1,city2]
availability.available= [true,false]

Now if you want to check for products which are available in city2, this document will qualify.

If you want to maintain the correlation, you should go with nested objects. Nested objects are considered as separate documents and managed internally by ES. The joins are performed internally by ES so you don't have to worry about it and you can run aggregations over it. On the down side, nested objects slow down the system as more shard level communication is required.

answered Sep 7, 2017 at 7:10

Krrish Raj

1,5451 gold badge13 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Andrey Koftun Over a year ago

Thank you very much for the ideas. For me, this is useful information. I still study a lot of documentation. I will test different solutions and learn.

dshockley · Accepted Answer · 2017-09-07 07:41:41Z

Your dataset (1000 products/100 cities) is very small. If you do not expect it to scale to be much larger, you can probably use a nested data structure (which is the most obvious solution here). Your mapping would look something like this:

{
  "product": {
    "properties": {
      "product": {"type": "keyword"},
      "cities": {
        "type": "nested",
        "properties": {
          "name": {"type": "keyword"},
          "available": {"type": "integer"}
        }
      }
    }
  }
}

Then you would index documents that look like this:

{
  "product": "product1",
  "cities": [
    {
      "name": "city1",
      "available": 0
    },
    {
      "name": "city2",
      "available": 1
    }
  ]
}

However, nested queries and aggregations are expensive/slow, so if you expect your dataset to grow substantially, you may want to consider denormalizing your data. In your case, I can see a few possible ideas for this, which will depend on how you want to query your data.

Simple flattening (one doc per city/product combo):

Doc 1:
{
  "product": "product1",
  "city": "city1",
  "available": 0
}
Doc 2:
{
  "product": "product1",
  "city": "city2",
  "available": 1
}

The down side here is that you can't easily search by product (since the products are duplicated). You may be able to resolve that by keeping a separate index of products to query when you need to query in that way.

In case you never expect to get more cities than 100 (or 1000), you could have one field per city, like this:

{
  "product": "product1",
  "city1": 0,
  "city2": 1,
  ...
}

Note that in case you do this, you don't actually need to have all the cities in each source document -- missing keys are fine. The "down side" of this is that you need to know in advance the name of the cities you're interested in (in order to query), in order to query. Probably this is not the right solution for you, but it is useful in some use cases.

In case your available numbers are always low, and you expect this to always be the case (like if you never expect to have more than 10 available), you could do something like this:

{
  "product": "product1",
  "available": {
    "0": ["city1", "city2"],
    "1": ["city2"],
    "2": [],
    ...
  }
}

So if you want to see if city1 has the product (regardless of whether they're available), you can query available.0, and if you want to see if it has at least 1 available, you can query available.1, etc. If you want to see cities where product1 has at least 1 available, you can do a terms aggregation on available.1. In case you are using this kind of a data structure, you would probably want to add another field, which will contain the exact numbers for each city (not nested, so not very useful for querying, but for convenience after you've retrieved the data).

Juvenik · Accepted Answer · 2017-09-07 05:56:35Z

1

I would store them as follows:

{
  "product" : "product1",
  "city-avail" : [
      {
        "city" : "city1",
        "available" : 0
      },
      {
        "city" : "city2",
        "available" : 1
      }
    ]
}
{
  "product" : "product2",
  "city-avail" : [
      {
        "city" : "city3",
        "available" : 1
      },
      {
        "city" : "city2",
        "available" : 0
      }
    ]
}

answered Sep 7, 2017 at 5:56

Juvenik

9601 gold badge10 silver badges30 bronze badges

1 Comment

spottedmahn Over a year ago

fair enough but why? 🤔

Carlos · Accepted Answer · 2017-09-07 06:13:36Z

1

For complex data (like key value pairs) I would use a nested field type. For simple data, like an array with numbers or strings I use array field type.

So in your case, if you are going to associate "objects" with city and available items I would use a nested field. Then you can search and aggregate by nested fields.

answered Sep 7, 2017 at 6:13

Carlos

1,43115 silver badges21 bronze badges

Collectives™ on Stack Overflow

Elasticsearch - how store data, nested or array?

4 Answers 4

1 Comment

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related