1

I'm looking at two ways of storing data in Elastic Search.

[
    {
        'first': 'dave',
        'last': 'jones',
        'age': 43,
        'height': '6ft'
    },
    {
        'first': 'james',
        'last': 'smith',
        'age': 43,
        'height': '6ft'
    },
    {
        'first': 'bill',
        'last': 'baker',
        'age': 43,
        'height': '6ft'
    }
]

or

[
    {
        'first': ['dave','james','bill'],
        'last': ['jones','smith','baker']
        'age': 43,
        'height': '6ft'
    }
]

(names are +30 character hashes. Nesting would not exceed the above)

My goals are:

  1. Query speed
  2. Disk space

We are talking the difference between 300Gb and a terabyte.

My question is can Elastic Search search nested data just as quickly as flattened out data?

1
  • 1
    Nested required more time for update and indexation. Query performance will be "similar" (but be carrefull, it s depend of the use case, if have you use scripts to loop on array for exemple, nested would be faster). Commented Nov 25, 2019 at 11:46

1 Answer 1

1

Elasticsearch will flatten your arrays of objects by default, exactly like you demonstrated in your example:

Arrays of inner object fields do not work the way you may expect. Lucene has no concept of inner objects, so Elasticsearch flattens object hierarchies into a simple list of field names and values.

So from the point of view of querying nothing will change. (However, if you need to query individual items of the inner arrays, like to query for dave jones, you may want to explicitly index it as nested data type, which will have poorer performance.)

Speaking about size on disk, by default there's compression enabled. Here you should keep in mind that Elasticsearch will store your original documents in two ways simultaneously: the original JSONs as source, and implicitly in the inverted indexes (which are actually used for the super fast searching).

If you want to read more about tuning for disk usage, here's a good doc page. For instance, you could enable even more aggressive compression for the source, or not store source on disk at all (although not advised).

Hope that helps!

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.