String Comparison with Elasticsearch Groovy Dynamic Script

Question

I have an elasticsearch index that contains various member documents. Each member document contains a membership object, along with various fields associated with / describing individual membership. For example:

{membership:{'join_date':2015-01-01,'status':'A'}}

Membership status can be 'A' (active) or 'I' (inactive); both Unicode string values. I'm interested in providing a slight boost the score of documents that contain active membership status.

In my groovy script, along with other custom boosters on various numeric fields, I have added the following:

String status = doc['membership.status'].value; 
float status_boost = 0.0;

if (status=='A') {status_boost = 2.0} else {status_boost=0.0}; 

return _score + status_boost

For some reason associated with how strings operate via groovy, the check (status=='A') does not work. I've attempted (status.toString()=='A'), (status.toString()=="A"), (status.equals('A')), plus a number of other variations.

How should I go about troubleshooting this (in a productive, efficient manner)? I don't have a stand-alone installation of groovy, but when I pull the response data in python the status is very much so either a Unicode 'A' or 'I' with no additional spacing or characters.

Are you sure the value is 'A' , because due to analyzing this might have been changed to small ;a' .Can you try with that. — Vineeth Mohan
– Vineeth Mohan, Commented Aug 17, 2015 at 18:25

pickypg · Accepted Answer · 2015-08-17 18:51:38Z

3

@VineetMohan is most likely right about the value being 'a' rather than 'A'.

You can check how the values are indexed by spitting them back out as script fields:

$ curl -XGET localhost:9200/test/_search -d '
{
  "script_fields": {
    "status": {
      "script": "doc[\"membership.status\"].values"
    }
  }
}
'

From there, it should be an indication of what you're actually working with. More than likely based on the name and your usage, you will want to reindex (recreate) your data so that membership.status is mapped as a not_analyzed string. If done, then you won't need to worry about lowercasing of anything.

In the mean time, you can probably get by with:

return _score + (doc['membership.status'].value == 'a' ? 2 : 0)

As a big aside, you should not be using dynamic scripting. Use stored scripts in production to avoid security issues.

answered Aug 17, 2015 at 18:51

pickypg

22.4k5 gold badges75 silver badges85 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

datasci Over a year ago

By checking the value via script_fields instead of the value stored in the _source document, I found that not only was the 'A' lowercased, but it was also wrapped in brackets. The former makes perfect sense, but I find the latter a bit strange (given that the value is mapped as a string). Setting s = doc['membership.status'].value[0], then checking s against "a" did the trick.

pickypg Over a year ago

The bracketed nature of it is largely to do with the fact that I used values instead of value in the output, as well as just how script_fields displays content. Don't worry too much about that unless it was also returning more than one value, which analyzed strings are prone to do.

Collectives™ on Stack Overflow

String Comparison with Elasticsearch Groovy Dynamic Script

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related