1

I need to sort based on two logical part in script. For each document, min value ( HQ and offices distance from given distance) is calculated and returned for sorting. Since I need to return only 1 value, I need to combine those scripts that calculate distance between hq and given location as well as multiple offices and given location.

I tried to combine those but Offices is nested property and Headquarter is non-nested property. If I use "NestedPath", somehow I am not able to access Headquarter property. Without "NestedPath", I am not able to use Offices property. here is the mapping:

         "offices" : {
            "type" : "nested",
            "properties" : {
              "coordinates" : {
                "type" : "geo_point",
                "fields" : {
                  "raw" : {
                    "type" : "text",
                    "index" : false
                  }
                },
                "ignore_malformed" : true
              },
              "state" : {
                "type" : "text"
              }
            }
          },
        "headquarters" : {
            "properties" : {
              "coordinates" : {
                "type" : "geo_point",
                "fields" : {
                  "raw" : {
                    "type" : "text",
                    "index" : false
                  }
                },
                "ignore_malformed" : true
              },
              "state" : {
                "type" : "text"
              }
            }
          }

And here is the script that I tried :

 "sort": [
    {
      "_script": {
        "nested" : {
          "path" : "offices"
        },
        "order": "asc",
        "script": {
          "lang": "painless",
          "params": {
            "lat": 28.9672,
            "lon": -98.4786
          },
          "source": "def hqDistance = 1000000;if (!doc['headquarters.coordinates'].empty){hqDistance = doc['headquarters.coordinates'].arcDistance(params.lat, params.lon) * 0.000621371;} def officeDistance= doc['offices.coordinates'].arcDistance(params.lat, params.lon) * 0.000621371; if (hqDistance < officeDistance) { return hqDistance; } return officeDistance;"
        },
        "type": "Number"
      }
    }
  ],

When I run the script, Headquarters logic is not even executed it seems, I get results only based on offices distance.

1 Answer 1

1

Nested fields operate in a separate context and their content cannot be accessed from the outer level, nor vice versa.

You can, however, access a document's raw _source.

But there's a catch:

  • See, when iterating under the offices nested path, you were able to call .arcDistance because the coordinates are of type ScriptDocValues.GeoPoint.
  • But once you access the raw _source, you'll be dealing with an unoptimized set of java.util.ArrayLists and java.util.HashMaps.

This means that even though you can iterate an array list:

...
for (def office : params._source['offices']) {
   // office.coordinates is a trivial HashMap of {lat, lon}!
}

calculating geo distances won't be directly possible…

…unless you write your own geoDistance function -- which is perfectly fine with Painless, but it'll need to be defined at the top of a script.

No need to reinvent the wheel though: Calculating distance between two points, using latitude longitude?

A sample implementation

Assuming your documents look like this:

POST my-index/_doc
{
  "offices": [
    {
      "coordinates": "39.9,-74.92",
      "state": "New Jersey"
    }
  ],
  "headquarters": {
    "coordinates": {
      "lat": 40.7128,
      "lon": -74.006
    },
    "state": "NYC"
  }
}

your sorting script could look like this:

GET my-index/_search
{
   "sort": [
    {
      "_script": {
        "order": "asc",
        "script": {
          "lang": "painless",
          "params": {
            "lat": 28.9672,
            "lon": -98.4786
          },
          "source": """
            // We can declare functions at the beginning of a Painless script
            // https://www.elastic.co/guide/en/elasticsearch/painless/current/painless-functions.html#painless-functions
            
            double deg2rad(double deg) {
              return (deg * Math.PI / 180.0);
            }
            
            double rad2deg(double rad) {
              return (rad * 180.0 / Math.PI);
            }
            
            // https://stackoverflow.com/a/3694410/8160318
            double geoDistanceInMiles(def lat1, def lon1, def lat2, def lon2) {
              double theta = lon1 - lon2;
              double dist = Math.sin(deg2rad(lat1)) * Math.sin(deg2rad(lat2)) + Math.cos(deg2rad(lat1)) * Math.cos(deg2rad(lat2)) * Math.cos(deg2rad(theta));
              dist = Math.acos(dist);
              dist = rad2deg(dist);
              return dist * 60 * 1.1515;
            }

            // start off arbitrarily high            
            def hqDistance = 1000000;

            if (!doc['headquarters.coordinates'].empty) {
              hqDistance = doc['headquarters.coordinates'].arcDistance(params.lat, params.lon) * 0.000621371;
            }
            
            // assume office distance as large as hq distance
            def officeDistance = hqDistance;
            
            // iterate each office and compare it to the currently lowest officeDistance
            for (def office : params._source['offices']) {
              // the coordinates are formatted as "lat,lon" so let's split...
              def latLong = Arrays.asList(office.coordinates.splitOnToken(","));
              // ...and parse them before passing onwards
              def tmpOfficeDistance = geoDistanceInMiles(Float.parseFloat(latLong[0]),
                                                         Float.parseFloat(latLong[1]),
                                                         params.lat,
                                                         params.lon);
              // we're interested in the nearest office...
              if (tmpOfficeDistance < officeDistance) {
                officeDistance = tmpOfficeDistance;
              }
            }
            
            if (hqDistance < officeDistance) {
              return hqDistance;
            }
            
            return officeDistance;
          """
        },
        "type": "Number"
      }
    }
  ]
}

Shameless plug: I dive deep into Elasticsearch scripting in a dedicated chapter of my ES Handbook.

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you so much for your detailed answer. However my offices data is in this format : ``` "offices" : [ { "coordinates" : "39.9,-74.92", "state" : "New Jersey" } ] ``` so passing coordinates as office.coordinates.lat and office.coordinates.lon in function gives me error as expected. I tried to pass lat and Lon like this : office.coordinates[0], office.coordinates[1] And I am getting this error: "Attempting to address a non-array-like type [java.lang.String] as an array". I appreciate your help here!
No prob. You'll need to parse your string array of coords... I updated my answer.
With SplitOnToken, I am getting this error : "dynamic method [java.lang.String, splitOnToken/1] not found". I think it's because my elastic version is old (v 6.7). Is there any workaround to this ?
indexOf and substring method worked for me. Your solution should work for anyone who has similar problem. Thank a lot to you. def latLong = office.coordinates; def delPosition = latLong.indexOf(','); def latitude = latLong.substring(0, delPosition); def longitude = latLong.substring(delPosition +1);

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.