0

I am trying to sort the following items with elasticsearch

[
    {name: 'Company 1'},
    {name: 'Company 2'},
    {name: 'aa 01'},
    {name: 'aabb'}
]

If I am doing a sort on name I have (-> ... is the sort part from ES)

aa 01 -> 01
Company 1 -> 1
Company 2 -> 2
aabb -> aabb

I would like to have

aa 01
aabb
Company 1
Company 2

I tried to change the mapping with type: 'keyword' (-> ... is the sort part from ES)

Company 1 -> Company 1
Company 2 -> Company 2
aa 01 -> aa 01
aabb -> aabb

I tried to find other alertnatives but it seems to be old ES version, like this one Elastic search alphabetical sorting based on first character, the index_analyzer or index are not working

2
  • 2
    what's the problem here? notice that uppercase characters are less than lowercase ones by their ascii code. so the mentioned elasticsearch result is already sorted. but if you want to ignore the case you can use a custom analyzer with 'lowercase' tokenizer. Commented Oct 16, 2018 at 7:30
  • @AmirMasudZareBidaki Correct (y) Commented Oct 16, 2018 at 10:17

1 Answer 1

1

You're getting results in lexicographical order which is perfectly fine for a computer but does not make much sense to human beings (expecting results to be sorted in alphabetical order).

The bytes used to represent capital letters have a lower value than the bytes used to represent lowercase letters, and so the names are sorted with the lowest bytes first. ASCII Table

To achieve this, you need to index each name in a way that the byte ordering corresponds to the sort order that you want. In other words, you need an analyzer that will emit a single lowercase token.

Create a custom keyword analyzer for the field you want to sort:

PUT /my_index
{
  "settings" : {
    "analysis" : {
      "analyzer" : {
        "custom_keyword_analyzer" : {
          "tokenizer" : "keyword",
          "filter" : ["lowercase"]
        }
      }
    }
  },
  "mappings" : {
    "_doc" : {
      "properties" : {
        "name" : {
          "type" : "text",
          "fields" : {
            "raw" : {
              "type" : "text",
              "analyzer" : "custom_keyword_analyzer",
              "fielddata": true
            }
          }
        }
      }
    }
  }
}

Index your data:

POST my_index/_doc/1
{
  "name" : "Company 01"
}

POST my_index/_doc/2
{
  "name" : "Company 02"
}

POST my_index/_doc/3
{
  "name" : "aa 01"
}

POST my_index/_doc/4
{
  "name" : "aabb"
}

Perform sort:

POST /my_index/_doc/_search
{
  "sort": "name.raw"
}

Response:

[
    {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "3",
        "_score": null,
        "_source": {
            "name": "aa 01"
        },
        "sort": [
            "aa 01"
        ]
    },
    {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "4",
        "_score": null,
        "_source": {
            "name": "aabb"
        },
        "sort": [
            "aabb"
        ]
    },
    {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "1",
        "_score": null,
        "_source": {
            "name": "Company 01"
        },
        "sort": [
            "company 01"
        ]
    },
    {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "2",
        "_score": null,
        "_source": {
            "name": "Company 02"
        },
        "sort": [
            "company 02"
        ]
    }
] 

Reference: Sorting and Collations

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.