1

I have data on ElasticSearch index that looks like this

 {
     "title": "cubilia",
      "people": [
          "Ling Deponte",
          "Dana Madin",
          "Shameka Woodard",
          "Bennie Craddock",
           "Sandie Bakker"
      ]
  }

Is there a way for me to do a search for all the people whos name starts with "ling" (should be case insensitive) and get distinct terms properly cased "Ling Deponte" not "ling deponte"? I am find with changing mappings on the index in any way.

Edit does what I want but is really bad query:

{
  "size": 0,
  "aggs": {
    "person": {
      "filter": {
        "bool":{
          "should":[
              {"regexp":{
                  "people.raw":"(.* )?[lL][iI][nN][gG].*"
              }}
             ]}
      },
      "aggs": {
        "top-colors": {
          "terms": {
              "size":10,
            "field": "people.raw",
            "include":
            {
              "pattern": ["(.* )?[lL][iI][nN][gG].*"]
            }
          }
        }
      }
    }
  }
}

people.raw is not_analyzed

2 Answers 2

2

Yes, and you can do it without a regular expression by taking advantage of Elasticsearch's full text capabilities.

GET /test/_search
{
  "query": {
    "match_phrase": {
      "people": "Ling"
    }
  }
}

Note: This could also be match or match_phrase_prefix in this case. The match_phrase* queries imply an order of the values in the text. match simply looks for any of the values. Since you only have one value, it's pretty much irrelevant.

The problem is that you cannot limit the document responses to just that name because the search API returns documents. With that said, you can use nested documents and get the desired behavior via inner_hits.

You do not want to do wildcard prefixing whenever possible because it simply does not work at scale. To put it in SQL terms, that's like doing a full table scan; you effectively lose the benefit of the inverted index because it has to walk it entirely to find the actual start.

Combining the two should work pretty well though. Here, I use the query to widdle down results to what you are interested in, then I use your inner aggregation to only include based on the value.

{
  "size": 0,
  "query": {
    "match_phrase": {
      "people": "Ling"
    }
  }
  "aggs": {
    "person": {
      "terms": {
        "size":10,
        "field": "people.raw",
        "include": {
          "pattern": ["(.* )?[lL][iI][nN][gG].*"]
        }
      }
    }
  }
}
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you.. At first didn't use multi field and after adding it I forgot about. the analyzed field. Do you think there is a way to get rid of "include": { "pattern": ["(.* )?[lL][iI][nN][gG].*"] } as well? I had idea about making people nested objects with name field but I don't about performance. Can I make string nested like objects?
Nested performance, particularly at a single level, is very good. You'd have to change it to "people": [{"name":"Ling Deponte"},{"name":"abc"},...], but then it would work by using inner hits with it. Inner hits is literally performing a second query, but against a small subset it should be okay. That would also completely avoid the aggregation.
0

Hi Please find the query it may help for your request

GET skills/skill/_search
{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "bool": {
          "must": [
            {
              "wildcard": {
                "skillNames.raw": "jav*"
              }
            }
          ]
        }
      }
    }
  }
}

My intention is to find documents starting with the "jav"

1 Comment

Thank for the response but this won't return distinct items of the array that start "jav" but will return all the terms in the document.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.