ElasticSearch - script_score with nested within filters does not affect the scores - why?

Question

Used properties:

{
 "mappings": {
   "properties": {
     "attribute_must_1": {
       "type": "nested"
     },
     "attribute_1": {
       "type": "nested"
     },
     "attribute_2": {
       "type": "nested"
     },
   }
 }

}

Input documents for testing:

POST _bulk
{"index":{"_index":"scores","_type":"_doc"}}
{"attribute_must_1":{"id":1},"attribute_1":{"id":9},"attribute_2":{"id":3}}
{"index":{"_index":"scores","_type":"_doc"}}
{"attribute_must_1":{"id":1},"attribute_1":{"id":9},"attribute_2":{"id":3}}
{"index":{"_index":"scores","_type":"_doc"}}
{"attribute_must_1":{"id":1},"attribute_1":{"id":8},"attribute_2":{"id":3}}
{"index":{"_index":"scores","_type":"_doc"}}
{"attribute_must_1":{"id":1},"attribute_1":{"id":7},"attribute_2":{"id":3}}
{"index":{"_index":"scores","_type":"_doc"}}
{"attribute_must_1":{"id":1},"attribute_1":{"id":11},"attribute_2":{"id":3}}
{"index":{"_index":"scores","_type":"_doc"}}
{"attribute_must_1":{"id":1},"attribute_1":{"id":5},"attribute_2":{"id":3}}
{"index":{"_index":"scores","_type":"_doc"}}
{"attribute_must_1":{"id":1},"attribute_1":{"id":10},"attribute_2":{"id":3}}
{"index":{"_index":"scores","_type":"_doc"}}
{"attribute_must_1":{"id":1},"attribute_1":{"id":6},"attribute_2":{"id":3}}
{"index":{"_index":"scores","_type":"_doc"}}
{"attribute_must_1":{"id":1},"attribute_1":{"id":7},"attribute_2":{"id":3}}
{"index":{"_index":"scores","_type":"_doc"}}
{"attribute_must_1":{"id":1},"attribute_1":{"id":7},"attribute_2":{"id":3}}

Actual Query:

q = {
    "size": 10,
    "query": {
        "function_score": {
            "query": {
    "bool": {
      "filter": [
      ],
      "must": [
        {
          "nested": {
            "path": "attribute_must_1",
            "query": {
              "term": {
                "attribute_must_1.id": "1"
              }
            }
          }
        }
      ]
    }
  },
  "boost": 1,
  "functions": [
    {
      "filter": {
        "nested": {
          "path": "attribute_1",
          "query": {
              "script_score": {
                "query": {
                      "match_all": {}
                  },
                  "script": {
                      "source": "decayNumericLinear(params.origin, params.scale, params.offset, params.decay, doc['attribute_1.id'].value)",
                      "params": {
                          "origin": 10,
                          "scale": 5,
                          "decay": 2,
                          "offset": 0
                      }
                  }
              }
          },
        }
      },
      "weight": 30
    },
    {"filter": {"nested": {"path": "attribute_2", "query": {"term": {"attribute_2.id": "3"}}}}, "weight": 70},

  ],
  "score_mode": "sum",
  "boost_mode": "replace"
 }
},
"sort": [
  "_score",
   {
     "date_deposit": {
     "order": "desc"
   }
   }
   ]
  }

I am trying to add a new filter with a nested field "attribute_1" where I want to calculate a distance between the actual value and the value from all other documents, but there is no influence on the scores that I can see:

for attribute_1 of found:

documents = [9, 9, 9, 10, 9, 9, 4, 9, 3, 9]

I get (sum of 30% and 70% weights from 2 attributes):

scores = [100, 100, 100, 100, 100, 100, 100, 100, 100, 100]

so it seems quite binary while it should be somehow a linear function. What I want in something like this:

for found documents values: [10, 9, 8, 3, 10] and the input value of 10 -> I would like to have:

scores (let's say in percentage): [100%, 90%, 80%, 30%, 100%]

I would like to have a simple score as an output ranging from 0-100% but including partial scores from multiple attributes (attribute_1, attribute_2, ...) in a way that:

score from attribute_1 in a linear score based on the distance (i.e. any value from 0% to 30%)
score from attribute_2 is either 0% or 70% (term query)

I have tried different variations, but nothing works - what is the correct way of doing that? I have the impression that the filter query can't do script_scores somehow ...

I hope that somebody could help me with that? Huge THNX!

Alex Baidan · Accepted Answer · 2020-08-07 09:09:54Z

1

+100

I have tried different variations, but nothing works - what is the correct way of doing that? I have the impression that the filter query can't do script_scores somehow ...

Yes, you are right. As mentioned in documentation - "In a filter context, a query clause answers the question “Does this document match this query clause?” The answer is a simple Yes or No — no scores are calculated. Filter context is mostly used for filtering structured data, e.g."

I will recommend you to not use filter in queries that need to be scored.

answered Aug 7, 2020 at 9:09

Alex Baidan

1,0757 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Piotr L. Over a year ago

The only working version I was able to build within "functions" was: using "script_score": {"script_score":{ "script": { "source" : "decayNumericLinear(params.origin, params.scale, params.offset, params.decay, doc['attribute_1'].value)", "params": { "origin": 10, "scale": 5, "decay": 0.5, "offset" : 0 } but in this case, the attribute_1 can't be nested...

Piotr L. Over a year ago

I was wondering how should I implement the same function for a nested attribute, so these main question still remains.

Jozef - Spatialized.io · Accepted Answer · 2020-08-05 15:11:28Z

0

I'm not sure what the difference between attribute_must_1 and attribute_1 is in your example. But taking a step back, a rudimentary pivoted percentage calculation can be achieved much more simply:

Set up a nested mapping:

PUT scores
{
  "mappings": {
    "properties": {
      "attribute_must_1": {
        "type": "nested"
      }
    }
  }
}

Sync the sample docs ([9, 9, 9, 10, 9, 9, 4, 9, 3, 9]):

POST _bulk
{"index":{"_index":"scores","_type":"_doc"}}
{"attribute_must_1":{"id":9}}
{"index":{"_index":"scores","_type":"_doc"}}
{"attribute_must_1":{"id":9}}
{"index":{"_index":"scores","_type":"_doc"}}
{"attribute_must_1":{"id":9}}
{"index":{"_index":"scores","_type":"_doc"}}
{"attribute_must_1":{"id":10}}
{"index":{"_index":"scores","_type":"_doc"}}
{"attribute_must_1":{"id":9}}
{"index":{"_index":"scores","_type":"_doc"}}
{"attribute_must_1":{"id":9}}
{"index":{"_index":"scores","_type":"_doc"}}
{"attribute_must_1":{"id":4}}
{"index":{"_index":"scores","_type":"_doc"}}
{"attribute_must_1":{"id":9}}
{"index":{"_index":"scores","_type":"_doc"}}
{"attribute_must_1":{"id":3}}
{"index":{"_index":"scores","_type":"_doc"}}
{"attribute_must_1":{"id":9}}

Use a subtractive function script score query:

GET scores/_search
{
  "query": {
    "nested": {
      "path": "attribute_must_1",
      "query": {
        "function_score": {
          "query": {
            "match_all": {}
          },
          "script_score": {
            "script": {
              "source": "((float)doc['attribute_must_1.id'].value / params.origin) * 100",
              "params": {
                "origin": 10.0
              }
            }
          },
          "boost_mode": "replace"
        }
      }
    }
  }
}

Check the scores:

[
  {
    "_score":100.0,
    "_source":{
      "attribute_must_1":{
        "id":10
      }
    }
  },
  {
    "_score":90.0,
    "_source":{
      "attribute_must_1":{
        "id":9
      }
    }
  },
  ...
  {
    "_score":40.0,
    "_source":{
      "attribute_must_1":{
        "id":4
      }
    }
  },
  {
    "_score":30.0,
    "_source":{
      "attribute_must_1":{
        "id":3
      }
    }
  }
]

answered Aug 5, 2020 at 15:11

Jozef - Spatialized.io

17k4 gold badges29 silver badges79 bronze badges

5 Comments

Piotr L. Over a year ago

In my case, I need to combine several attributes of type: (musts) -> they need to match (bool) to define a scope of a search (should match) -> they should match and have predefined weights As an output I need a score 0-100% which will be a combination of several (should match) attributes and their weights, i.e.:

Piotr L. Over a year ago

If I have two (should must) attributes: attribute_1 and attribute_2 and weight_1=3 and weight_2=7 are their corresponding weights, I would like to have an output score in a way that: - attribute_1 is a distance function (as in your query) participating in the total score in the range (0-30%) and attribute_2 has constant participation of 0% or 70% I need a way of combining several of these attributes (some with just a bool predefined weight (0% or weight) and some as a distance function (continuous values 0-weight)

Piotr L. Over a year ago

In order to achieve this, I needed to use a list of functions with filters (as represented in my original code) Do you have any idea on how to integrate your solution to this case? Thank you so much for your help !!!

Jozef - Spatialized.io Over a year ago

Thanks for the explanation. Can you please edit the original question and share a few of your actual docs -- not just abstract lists of integers? And also the difference between attribute_must_1 and attribute_1 -- it's still not clear.

Piotr L. Over a year ago

The difference between attribute_must_1 and attribute_1 is that the first one defines the scope of the research for the query and simply is a must match argument without any influence on the score. The second one attribute_1 and 'attribute_2' are the actual matching criteria that will participate in the output score calculations.

Collectives™ on Stack Overflow

ElasticSearch - script_score with nested within filters does not affect the scores - why?

2 Answers 2

2 Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related