1

I am PUTting the following document in ElasticSearch:

{
    "_rootId": "327d3aba-4f7c-4abb-9ff3-b1608c354c7c",
    "_docId": "ID_3",
    "_ver": 0,
    "val_labels": [
        "x1",
        "x1",
        "x1"
    ]
}

Then, I GET the following query which uses a painless script for sorting:

{
    "query": {
        "bool": {
            "must": [
                {
                    "term": {
                        "_rootId": "77394e08-32be-4611-bbf7-818dfe4bc853"
                    }
                }
            ]
        }
    },
    "sort": [
        {
            "_script": {
                "order": "desc",
                "type": "string",
                "script": {
                    "lang": "painless",
                    "source": "return doc['val_labels'].toString()"
                }
            }
        }
    ]
}

And this is the response that I receive:

{
    "took": 30,
    "timed_out": false,
    "_shards": {
        "total": 12,
        "successful": 12,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": null,
        "hits": [
            {
                "_index": "my-index",
                "_type": "views",
                "_id": "77394e08-32be-4611-bbf7-818dfe4bc853.ID_3",
                "_score": null,
                "_source": {
                    "_rootId": "77394e08-32be-4611-bbf7-818dfe4bc853",
                    "_docId": "ID_3",
                    "_ver": 0,
                    "val_labels": [
                        "x1",
                        "x1",
                        "x1"
                    ]
                },
                "sort": [
                    "[x1]"
                ]
            }
        ]
    }
}

The weird thing is that the val_labels field in the response shows ["x1", "x1", "x1"] (as expected, see the inserted object) whereas the sort field shows just a single x1 value.

Is there any explanation for this?

1 Answer 1

2

Field _source in result is the original unmodified document whereas the sort script is accessing doc values doc['val_labels'] which are processed fields. This can be debugged by fetching docvalue_fields explicitly:

{
    "query": {
        "match_all": {}
    },
    "docvalue_fields": [
      "val_labels"
    ]
}

which yields the following hit (I only indexed a single doc)

{
  "hits": [
    {
      "_index": "test",
      "_type": "_doc",
      "_id": "ID_3",
      "_score": 1,
      "_source": {
        "val_labels": [
          "x1",
          "x1",
          "x1"
        ]
      },
      "fields": {
        "val_labels": [
          "x1"
        ]
      }
    }
  ]
}

Note the deduplicated values in result. This is because multiple same values result in increasing the term frequencies

GET /test/_doc/ID_3/_termvectors?fields=val_labels
{
  "term_vectors": {
    "val_labels": {
      "field_statistics": {
        "sum_doc_freq": 1,
        "doc_count": 1,
        "sum_ttf": -1
      },
      "terms": {
        "x1": {
          "term_freq": 3,
          "tokens": [
            {
              "position": 0,
              "start_offset": 0,
              "end_offset": 2
            },
            {
              "position": 1,
              "start_offset": 3,
              "end_offset": 5
            },
            {
              "position": 2,
              "start_offset": 6,
              "end_offset": 8
            }
          ]
        }
      }
    }
  }
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.