4

ElasticSearch allows inner_hits to specify 'from' and 'size' parameters, as can the outer request body of a search.

As an example, assume my index contains 25 books, each having less than 50 chapters. The below snippet would return all chapters across all books, because a 'size' of 100 books includes all of 25 books and a 'size' of 50 chapters includes all of "less than 50 chapters":

        "index": 'books',
        "type": 'book',
        "body": {
          "from" : 0, "size" : 100, // outer hits, or books
          "query": {
              "filtered": {
                "filter": {
                  "nested": {
                    "inner_hits": {
                      "size": 50 // inner hits, or chapters
                    },
                    "path": "chapter",
                    "query": { "match_all": { } }, 
                  }
                }
               }
            },
            .
            .
            .

Now, I'd like to implement paging with a scenario like this. My question is, how?

In this case, do I have to return back the above max of 100 * 50 = 5000 documents from the search query and implement paging in the application level by displaying only the slice I am interested in? Or, is there a way to specify the total number of hits to return back in the search query itself, independent of the inner/outer size?

I am looking at the "response" as follows, and so would like this data to be able to be paginated:

        response.hits.hits.forEach(function(book) {
           chapters = book.inner_hits.chapters.hits.hits;

           chapters.forEach(function(chapter) {
               // ... this is one displayed result ...
           });
        });
3
  • Does it work to add size=5000 to your GET url per: elastic.co/guide/en/elasticsearch/guide/current/pagination.html Commented Sep 23, 2015 at 20:18
  • I'm using the 'search' method in the Node.js client (and enabling tracing shows that it is doing a POST) and not constructing the URL directly, however, I am already using 'size' parameters twice where indicated by the 'outer hits' and 'inner hits' comments in my snippet. 'size' works to limit the parent documents being examined, and the number of nested inner results returned within each of those parents, but I essentially want to be able to specify the 'size' and 'from' across that entire set. Commented Sep 23, 2015 at 20:31
  • can you add size: 5000 at the same level as index: "books"? That is the way I limit the total results in the ruby client. Commented Sep 23, 2015 at 20:34

2 Answers 2

5
+50

I don't think this is possible with Elasticsearch and nested fields. The way you see the results is correct: ES paginates and returns books and it doesn't see inside nested inner_hits. Is not how it works. You need to handle the pagination manually in your code.

There is another option, but you need a parent/child relationship instead of nested.

Then you are able to query the children (meaning, the chapters) and paginate the results (the chapters). You can use inner_hits and return back the parent (the book itself).

PUT /library
{
  "mappings": {
    "book": {
      "properties": {
        "name": {
          "type": "string"
        }
      }
    },
    "chapter": {
      "_parent": {
        "type": "book"
      },
      "properties": {
        "title": {
          "type": "string"
        }
      }
    }
  }
}

The query:

GET /library/chapter/_search
{
  "size": 5, 
  "query": {
    "has_parent": {
      "type": "book",
      "query": {
        "match_all": {}
      },
      "inner_hits" : {}
    }
  }
}

And a sample output (trimmed, complete example here):

  "hits": [
     {
        "_index": "library",
        "_type": "chapter",
        "_id": "1",
        "_score": 1,
        "_source": {
           "title": "chap1"
        },
        "inner_hits": {
           "book": {
              "hits": {
                 "total": 1,
                 "max_score": 1,
                 "hits": [
                    {
                       "_index": "library",
                       "_type": "book",
                       "_id": "book1",
                       "_score": 1,
                       "_source": {
                          "name": "book1"
                       }
                    }
                 ]
              }
           }
        }
     },
     {
        "_index": "library",
        "_type": "chapter",
        "_id": "2",
        "_score": 1,
        "_source": {
           "title": "chap2"
        },
        "inner_hits": {
           "book": {
              "hits": {
                 "total": 1,
                 "max_score": 1,
                 "hits": [
                    {
                       "_index": "library",
                       "_type": "book",
                       "_id": "book1",
                       "_score": 1,
                       "_source": {
                          "name": "book1"
                       }
                    }
                 ]
              }
           }
        }
     }
Sign up to request clarification or add additional context in comments.

1 Comment

I've suspected from the start that I might be better off indexing each nested chapter as its own document instead of using nesting within the same document, but was continuing on the path of trying to make things work with nesting. Separate documents simplifies a few other cases, so this is one more reason to do that, and with parent/child relationships I will be able to relate them. Thanks for the answer. If I cannot page inside inner_hits like I want to, I will change how I am indexing the data.
0

The search api allows for the addition of certain standard parameters, listed in the docs at: https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/api-reference-2-0.html#api-search-2-0

According to the doc:

size Number — Number of hits to return (default: 10)

Which would make your request something like:

    "size": 5000,
    "index": 'books',
    "type": 'book',
    "body": {

3 Comments

This is what I expected should work. It does not produce an error, but it also does not affect the number of results returned. I updated my question to show what part of the response I am looking at, since with nesting and inner_hits it might be a bit unusual.
Ah - i think I understand better now. So the size I specified will handle limiting the outer records only as I understand it. So if you have 1 hit or 50 hits on each of the inner results you'll get the same result-set back. Maybe you can script a field that calculates the number of inner results, then can work your way backwards from there?
Yes, that is the problem. I want to limit the size of the inner hits across all of the outer hits. Not just the outer hits, and not just the inner hits within each outer hit. I am new to ElasticSearch and haven't used script fields yet, but was hoping I wouldn't have to and that there might be some easier solution that I was missing. :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.