0

Is it possible to write an Elasticsearch query that will return only documents that have multiple values in a given field? I don't care what those values are, only that a field has multiple, instead of 1.

I'd like the query to match:

{
  "color": ["red", "blue"]
},
{
  "color": ["green", "yellow", "orange"]
}

but not

{
  "color": "red"
}

Ideally, I'd prefer to avoid using scripts in my query; they are disabled on my cluster.

1 Answer 1

3

I'm not aware of a solution approach that works without using a script. But you have choice:

Preparation: Index some sample documents

POST my_index/_bulk
{"index": {"_id": 1}}
{"color": ["red", "blue"]}
{"index": {"_id": 2}}
{"color": ["green", "yellow", "orange"]}
{"index": {"_id": 3}}
{"color": ["grey"]}

Option 1: Using a script at query time ("expensive")

GET my_index/_search
{
  "query": {
    "script": {
      "script": "doc.color.size() > 1"
    }
  }
}

Option 2: Using a script at indexing time ("cheap")

(preferred approach, as the script only gets executed once per document write)

PUT _ingest/pipeline/set_number_of_colors
{
  "processors": [
    {
      "script": {
        "lang": "painless",
        "source": "ctx.number_of_colors = ctx.color.size()"
      }
    }
  ]
}

POST my_index/_update_by_query?pipeline=set_number_of_colors

GET my_index/_search
{
  "query": {
    "range": {
      "number_of_colors": {"gt": 1}
    }
  }
}

You can also configure the pipeline as default pipeline for your index, so you don't need to change anything in your indexing application logic.

Sign up to request clarification or add additional context in comments.

1 Comment

I wouldn't say that I'm "happy" with it, since I cannot use scripts in my cluster, but it does appear to be correct. :P

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.