0

I am attempting to use nested values in a script score, but I am having issues making it work, because I am unable to iterate over the field by accessing it through doc. Also, when I try to query it in Kibana like _type:images AND _exists_:colors, it will not match any documents, even though the field is clearly present in all my docs when I view them individually. I am however able to access it using params._source, but I have read that it can be slow slow and is not really recommended.

I know that this issue is all due to the way we have created this nested field, so if I cannot come up with something better than this, I will have to reindex our 2m+ documents and see if I can find another way around the problem, but I would like to avoid that, and also just get a better understanding of how Elastic works behind the scenes, and why it acts the way it does here.

The example I will provide here is not my real life issue, but describes the issue just as well. Imagine we have a document, that describes an image. This document has a field that contains values for how much red, blue, and green exists in an image.

Requests to create index and documents with nested field that contains arrays of colors with a 100 point split between them:

PUT images
{
  "settings": {
    "number_of_shards": 1
  },
  "mappings": {
    "_doc": {
      "properties": {
        "id" : { "type" : "integer" },
        "title" : { "type" : "text" },
        "description" : { "type" : "text" },
        "colors": {
          "type": "nested",
          "properties": {
            "red": {
              "type": "double"
            },
            "green": {
              "type": "double"
            },
            "blue": {
              "type": "double"
            }
          }
        }
      }
    }
  }
}

PUT images/_doc/1
{
    "id" : 1,
    "title" : "Red Image",
    "description" : "Description of Red Image",
    "colors": [
      {
        "red": 100
      },
      {
        "green": 0
      },
      {
        "blue": 0
      }
    ]
}

PUT images/_doc/2
{
    "id" : 2,
    "title" : "Green Image",
    "description" : "Description of Green Image",
    "colors": [
      {
        "red": 0
      },
      {
        "green": 100
      },
      {
        "blue": 0
      }
    ]
}

PUT images/_doc/3
{
    "id" : 3,
    "title" : "Blue Image",
    "description" : "Description of Blue Image",
    "colors": [
      {
        "red": 0
      },
      {
        "green": 0
      },
      {
        "blue": 100
      }
    ]
}

Now, if I run this query, using doc:

GET images/_search
{
  "query": {
    "function_score": {
      "functions": [
        {
          "script_score": {
            "script": {
              "source": """
                boolean debug = true;
                for(color in doc["colors"]) {
                  if (debug === true) {
                    throw new Exception(color["red"].toString());
                  }
                }
              """
            }
          }
        }
      ]
    }
  }
}

I will get exception No field found for [colors] in mapping with types [], but if I use params._source instead, like so:

GET images/_search
{
  "query": {
    "function_score": {
      "functions": [
        {
          "script_score": {
            "script": {
              "source": """
                boolean debug = true;
                for(color in params._source["colors"]) {
                  if (debug === true) {
                    throw new Exception(color["red"].toString());
                  }
                }
              """
            }
          }
        }
      ]
    }
  }
}

I am able to output "caused_by": {"type": "exception", "reason": "100"}, so I know that it worked since the first document is red and has a value of 100.

I am not even sure that this can classify as a question, but more a cry for help. If someone can explain why this is behaving the way it is, and give an idea of the best way to get around the issue, I would really appreciate it.

(Also, some tips for debugging in Painless would also be lovely!!!)

3 Answers 3

1

In Elasticsearch's scoring script "script_score": {"script": {"source": "..." }} you may access nested values using param._source object.

For example, if you have documents index with documents like these:

{
  "title": "Yankees Potential Free Agent Target: Max Scherzer",
  "body": "...",
  "labels": {
    "genres": "news",
    "topics": ["sports", "celebrities"]
    "publisher": "CNN"
  }
}

the following query will return 100 documents in randomized order, giving preference to documents with sports topic:

GET documents/_search
{
  "size": 100,
  "sort": [
    "_score"
  ],
  "query": {
    "function_score": {
      "query": { "match_all": {} },
      "functions": [
        {
          "random_score": {}
        },
        {
          "script_score": {
            "script": {
              "source": """
                double boost = 1.0;
                if (params._source['labels'] != null && params._source['labels']['topics'] != null && params._source['labels']['topics'].contains('sports') {
                    boost += 2.0;
                }
                return boost;
              """
            }
          }
        }
      ],
      "score_mode": "multiply",
      "boost_mode": "replace"
    }
  }
}
Sign up to request clarification or add additional context in comments.

Comments

1

Don't worry about the slowness of params._source -- it's your only choice here because iterating the doc's nested context only allows a single nested color to be accessed.

Try this:

GET images/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "image"
          }
        },
        {
          "function_score": {
            "functions": [
              {
                "script_score": {
                  "script": {
                    "source": """
                        def score = 0;
                        for (color in params._source["colors"]) {
                          // Debug.explain(color);
                          if (color.containsKey('red')) {
                            score += color['red'] ;
                          }
                        }
                        return score;
                    """
                  }
                }
              }
            ]
          }
        }
      ]
    }
  }
}

The painless score context is here.

Secondly, you were pretty close w/ throwing an exception manually -- there's a cleaner way to do it though. Uncomment Debug.explain(color); and you're good to go.

One more thing, I purposefully added a match query to increase the scores but, more importantly, to illustrate how a query is built in the background -- when you rerun the above under GET images/_validate/query?explain, you'll see for yourself.

Comments

0

I do not know what exactly you want to implemnt.

I think you can use nested query with script_score like follow example.

like this

GET images/_search
{
    "query": {
        "nested": {
            "path": "colors",
            "query": {
                "bool": {
                    "must": [{
                        "exists": {
                            "field": "colors.red"
                        }
                    }, {
                        "function_score": {
                            "script_score": {
                                "script": "doc['colors.red'].value"
                            }
                        }
                    }]
                }
            }
        }
    }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.