4

I'm trying to do nested sorting in Elasticsearch but so far didn't succeed.

My data structure:

{ "_id" : 1,
"authorList" : [
  {"lastName":"hawking", "firstName":"stephan"},
  {"lastName":"frey", "firstName":"richard"}
]
}

{ "_id" : 2,
"authorList" : [
  {"lastName":"roger", "firstName":"christina"},
  {"lastName":"freud", "firstName":"damian"}
]
}

I want to sort the documents according the first authors last name in the documents.

Used mapping:

"authorList" : { "type" : "nested", "properties" : {"lastName":{"type":"keyword"}}}

Sort using SearchRequestBuilder (JAVA):

    searchRequestBuilder.addSort(
SortBuilders.fieldSort("authorList.lastName")
.order(SortOrder.ASC)
.sortMode(SortMode.MIN)
.setNestedPath("authorList")
)

This works but doesn't give the wanted result (e.g. first "hawking" then "roger").

Did I missed something? Is there a way to indicate Elasticsearch to access index=0 of the array authorList? Is there any mapping / normalizer to index the first entry of the array separately?

1 Answer 1

7

Nested documents are not saved as a simple array or list. They are managed internally by Elasticsearch:

Elasticsearch is still fundamentally flat, but it manages the nested relation internally to give the appearance of nested hierarchy. When you create a nested document, Elasticsearch actually indexes two separate documents (root object and nested object), then relates the two internally. (more here)

I think you need to provide some additional information to elasticsearch that will be an indicator which author is the "primary/first" one. It is enough to put this additional field only to one author in a nested object (your mapping can stay as before), something like this:

{
    "authorList" : [
      {"lastName":"roger", "firstName":"christina", "authorOrder": 1},
      {"lastName":"freud", "firstName":"damian"}
    ]
},
{
    "authorList" : [
      {"lastName":"hawking", "firstName":"stephan", "authorOrder": 1},
      {"lastName":"adams", "firstName": "mark" }
      {"lastName":"frey", "firstName":"richard"}
    ]
},
{
    "authorList" : [
      {"lastName":"adams", "firstName":"monica", "authorOrder": 1},
      {"lastName":"adams", "firstName":"richard"}
    ]
}

Then the query could be:

{
  "query" : {
    "nested" : {
      "query" : {
        "bool" : {
          "must" : [
            {
              "match" : {
                "authorList.authorOrder" : 1
              }
            }
          ]
        }
      },
      "path" : "authorList"
    }
  },
  "sort" : [
    {
      "authorList.lastName" : {
        "order" : "asc",
        "nested_filter" : {
          "bool" : {
            "must" : [
              {
                "match" : {
                  "authorList.authorOrder" : 1
                }
              }
            ]
          }
        },
        "nested_path" : "authorList"
      }
    }
  ]
}

And with Java API:

QueryBuilder matchFirst = QueryBuilders.boolQuery()
        .must(QueryBuilders.matchQuery("authorList.authorOrder", 1));
QueryBuilder mainQuery = QueryBuilders.nestedQuery("authorList", matchFirst, ScoreMode.None);

SortBuilder sb = SortBuilders.fieldSort("authorList.lastName")
    .order(SortOrder.ASC)
    .setNestedPath("authorList")
    .setNestedFilter(matchFirst);

SearchRequestBuilder builder = client.prepareSearch("test")
        .setSize(50)
        .setQuery(mainQuery)
        .addSort(sb);

Note that SortBuilder has .setNestedFilter(matchAll) which means that sorting is based on authorList.lastName field but only of your "primary/first" nested elements. Without it, elasticsearch would first sort all nested documents, pick first element from ascending sorted list and based on this it would sort parent documents. So document with "Hawking" could be first as it has "Adams" last name.

Final result is:

"authorList" : [
      {"lastName":"adams", "firstName":"monica", "authorOrder": 1},
      {"lastName":"adams", "firstName":"richard"}
    ],
}
"authorList" : [
      {"lastName":"hawking", "firstName":"stephan", "authorOrder": 1},
      {"lastName":"adams", "firstName":"mark"},
      {"lastName":"frey", "firstName":"richard"}
    ]
},
{
    "authorList" : [
      {"lastName":"roger", "firstName":"christina", "authorOrder": 1},
      {"lastName":"freud", "firstName":"damian"}
    ]
}
Sign up to request clarification or add additional context in comments.

2 Comments

Ok, that would solve the problem. But if I have to introduce a new field, wouldn't it be easier just to create a field "firstAuthorLastName" instead, copying the value of the first array index? This would also simplify the query/sorting part.
Yes, if you can rearrange your model that way, then it would be definitely easier to query your data. If a document could have e.g. id, firstAuthorLastName and nested list of otherAuthors, then sorting on top level field firstAuthorLastName (instead of nested) would be also faster.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.