5

Could anyone advice me on how to do custom scoring in ElasticSearch when searching for an array of keywords from an array of keywords?

For example, let's say there is an array of keywords in each document, like so:

{ // doc 1
    keywords : [ 
            red : {
                    weight : 1
                }, 
            green : {
                    weight : 2.0
                },
            blue : {
                    weight: 3.0
                },
            yellow : {
                    weight: 4.3
                }
        ]
},
{ // doc 2
    keywords : [ 
            red : {
                    weight : 1.9
                }, 
            pink : {
                    weight : 7.2
                },
            white : {
                    weight: 3.1
                },
        ]
},
...

And I want to get scores for each documents based on a search that matches keywords against this array:

{
    keywords : [
            red : {
                    weight : 2.2
                }, 
            blue : {
                    weight : 3.3
                },
        ]
}

But instead of just determining whether they match, I want to use a very specific scoring algorithm:

enter image description here

Scoring a single field is easy enough, but I don't know how to manage it with arrays. Any thoughts?

2
  • Hi @Aleksi Asikainen, did you find any solution to this(using elasticsearch)? Commented May 5, 2015 at 12:45
  • Afraid not, but nowadays ElasticSearch does have a better function scoring support, which I think might be good enough to achieve this: elastic.co/guide/en/elasticsearch/reference/0.90/… Commented May 7, 2015 at 18:41

1 Answer 1

1

Ah an interesting question! (And one I think we can solve with some communication)

Firstly, have you looked at custom script scoring? I'm pretty sure you can do this slowly with that. If you were to do this I would consider doing a rescore phase where scoring is only calculated after the doc is known to be a hit.

However I think you can do this with elasticsearch machinery. As I can work out you are doing a dot-product between docs, (where the weights are actually half way between what you are specifying and 1).

So, my first suggestion remove the x/2n term from your "custom scoring" (dot product) and put your weights half way between 1 and the custom weight (e.g. 1.9 => 1.45).

... I'm sorry I will have to come back and edit this question. I was thinking about using nested docs with a field defined boost level, but alas, the _boost mapping parameter is only available for the root doc

p.s. Just had a thought, you could have fields with defined boost levels and store teh terms there, then you can do this easily but you loose precision. A doc would then look like:

{
  "boost_1": ["aquamarine"],
  "boost_2": null, //don't need to send this, just showing for clarity
  ...
  "boost_5": ["burgundy", "fuschia"]
  ...
}

You could then define a these boostings in your mapping. One thing to note is a fields boost value carries over to the _all field, so you would now have a bag of weighted terms in your _all field, then you could construct a bool: should query, with lots of term queries with different boost (for the weights of the second doc).

Let me know what you think! A very, very interesting question.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the long answer. I think you're right that the scoring would have to happen via rescoring phase if anything. Unfortunately, there is very little information available how to do rescoring with arrays, hence the question... At the moment I've elected to search using ElasticSearch and then carry out scoring of the results in PHP. That is extremely wasteful though, so I would rather move the scoring process completely into ElasticSearch.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.