Python elasticsearch DSL aggregation/metric of nested values per document

Question

I'm trying to find the minimum (smallest) value in a 2-level nesting (separate minimum value per document).

So far I'm able to make an aggregation which counts the min value from all the nested values in my search results but without separation per document.

My example schema:

class MyExample(DocType):
    myexample_id = Integer()
    nested1 = Nested(
        properties={
            'timestamp': Date(),
            'foo': Nested(
                properties={
                    'bar': Float(),
                }
            )
        }
    )
    nested2 = Nested(
        multi=False,
        properties={
            'x': String(),
            'y': String(),
        }
    )

And this is how I'm searching and aggregating:

from elasticsearch_dsl import Search, Q

search = Search().filter(
    'nested', path='nested1', inner_hits={},
    query=Q(
        'range', **{
            'nested1.timestamp': {
                'gte': exampleDate1,
                'lte': exampleDate2
            }
        }
    )
).filter(
    'nested', path='nested2', inner_hits={'name': 'x'},
    query=Q(
        'term', **{
            'nested2.x': x
        }
    )
).filter(
    'nested', path='nested2', inner_hits={'name': 'y'},
    query=Q(
        'term', **{
            'nested2.y': y
        }
    )
)

search.aggs.bucket(
    'nested1', 'nested', path='nested1'
).bucket(
    'nested_foo', 'nested', path='nested1.foo'
).metric(
    'min_bar', 'min', field='nested1.foo.bar'
)

Basically what I need to do is to get the min value for all the nested nested1.foo.bar values for each unique MyExample (they have unique myexample_id field)

Honza Král · Accepted Answer · 2017-01-12 09:05:33Z

2

If you want minimum value per document then put all the nested buckets within a bucket terms aggregation over myexample_id field:

search.aggs..bucket(
  'docs', 'terms', field='myexample_id'
).bucket(
  'nested1', 'nested', path='nested1'
).bucket(
  'nested_foo', 'nested', path='nested1.foo'
).metric(
  'min_bar', 'min', field='nested1.foo.bar'
)

Note that this aggregation might be extremely expensive to calculate since it has to create a bucket for each document. For a use case like this it might be easier to compute the minimum on a per document basis as a script_field or in the app.

answered Jan 12, 2017 at 9:05

Honza Král

3,03216 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Val Over a year ago

I was even going to suggest that this minimum value should be figured out at indexing time and stored at the root level of the document. It would make the whole thing a lot more performant than several levels of aggs on nested documents or using scripting.

Collectives™ on Stack Overflow

Python elasticsearch DSL aggregation/metric of nested values per document

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related