0

Let's say I've data in this format:

{
  "id": "doc0",
  "tags": ["a", "b", "c"]
}

{
  "id": "doc1",
  "tags": ["a", "b", "c", "d"]
}

{
  "id": "doc2",
  "tags": ["a", "b"]
}

I need to form an ES query that fetches only documents that contains both "a", "b" and nothing else.

If I write a terms query, it matches all the documents, as all documents have both "a" and "b" but only one document has nothing else apart from "a" and "b"

What is the best way to form this query? I don't have the list of the other values to add "not_contains" clause.

2 Answers 2

2
+25

There are two ways in which you can achieve your result :

  1. You can use a combination of bool query(with must and filter clause) and script query to retrieve only those documents that have both "a" and "b".

Index Data:

POST testidx/_doc/1
{
  "id": "doc0",
  "tags": ["a", "b", "c"]
}

POST testidx/_doc/2
{
  "id": "doc1",
  "tags": ["a", "b", "c", "d"]
}

POST testidx/_doc/3
{
  "id": "doc2",
  "tags": ["a", "b"]
}

Search Query:

POST testidx/_search
{
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "tags": "a"
              }
            },
            {
              "term": {
                "tags": "b"
              }
            },
            {
              "script": {
                "script": {
                  "source": "if(params.input.containsAll(doc['tags.keyword'])){return true;}",
                  "lang": "painless",
                  "params": {
                    "input": [
                      "a",
                      "b"
                    ]
                  }
                }
              }
            }
          ]
        }
      }
    }
  }
}

Search Result:

"hits" : [
      {
        "_index" : "testidx",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.0,
        "_source" : {
          "id" : "doc2",
          "tags" : [
            "a",
            "b"
          ]
        }
      }
    ]
  1. You can use minimum_should_match_script param with terms set query. When compared to a script query, Terms set query will be faster.

enter image description here

POST testidx/_search
{
  "query": {
    "bool": {
      "filter": {
        "terms_set": {
          "tags": {
            "terms": [
              "a",
              "b"
            ],
            "minimum_should_match_script": {
              "source": "doc['tags.keyword'].size()"
            }
          }
        }
      }
    }
  }
}
Sign up to request clarification or add additional context in comments.

1 Comment

"script": { "script": { "source": " return doc['k1.keyword'].size() == 2;", "lang": "painless" } }
0

You can use Terms Set query.

Before using teams set query, you need to update your index document with number of elements count in one field.

PUT sample1/_doc/1
{
 "id": "doc0",
  "tags": ["a", "b", "c"],
  "required_matches": 3
}
PUT sample1/_doc/2
{
  "id": "doc1",
  "tags": ["a","b","c","d"],
  "required_matches": 4
}
PUT sample1/_doc/3
{
  "id": "doc2",
  "tags": ["a","b"],
  "required_matches": 2
}

Query:

POST sample1/_search
{
  "query": {
    "terms_set": {
      "tags": {
        "terms": [ "a", "b"],
        "minimum_should_match_field": "required_matches"
      }
    }
  }
}

Result:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.17161848,
    "hits" : [
      {
        "_index" : "sample1",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.17161848,
        "_source" : {
          "id" : "doc2",
          "tags" : [
            "a",
            "b"
          ],
          "required_matches" : 2
        }
      }
    ]
  }
}

2 Comments

These are minimum should match, it can match greater count, too.. Right?
No, it will not match because number of element in your documents and value which you are sending in query always same. in your exmaple only, you have greter number for doc0 and doc1 but it is not coming in result right. Dont confused with minimum_should_match_field and minimum_should_match

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.