How to sort by minimum distance between multiple fields

I would like to sort results based on the minimum distance found from a few different geo_point fields.

Here is a basic example:

Mapping:

{
  "companies" : {
    "mappings" : {
      "_doc" : {
        "properties" : {
          "location" : {
            "properties" : {
              "lat" : {
                "type" : "float"
              },
              "lon" : {
                "type" : "float"
              }
            }
          },
          "service_cities" : {
            "properties" : {
              "location" : {
                "properties" : {
                  "lat" : {
                    "type" : "float"
                  },
                  "lon" : {
                    "type" : "float"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

What I would like to do is query the index with a lat, lon pair and get results sorted based on minimum geo_distance from location or any of the service_cities.locations.

For example, if my index looked something like this:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "companies",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          // New York City, NY
          "location" : {
            "lat" : 40.73061,
            "lon" : -73.935242
          },
          "service_cities" : [
            {
              // Denver, CO
              "location" : {
                "lat" : 39.742043,
                "lon" : -104.991531
              }
            }
          ]
        }
      },
      {
        "_index" : "companies",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {

          // Phoenix, AZ
          "location" : {
            "lat" : 33.448376,
            "lon" : -112.074036
          },
          "service_cities" : [
            {
              // San Francisco, CA
              "location" : {
                "lat" : 37.773972,
                "lon" : -122.431297
              }
            },
            {
              // Las Vegas, NV
              "location" : {
                "lat" : 36.114647,
                "lon" : -115.172813
              }
            }
          ]
        }
      }
    ]
  }
}

And I searched for the companies matching the coordinates of Denver, CO. I would expect the results to be doc2, doc1 since document 2 has a service city nearest to Denver.

However, if document 2 did not have that service city, I would expect the order to be doc1, doc2 since document 1 has a location or a service city closest to Denver.

From what I have read and played with, it seems like the best way to do this would be with using a script in sort, as using multiple _geo_distance entries in sort clearly does not work.

Any advice would be appreciated!

As it's within the same document, can't you compute that at index time so sorting is then obvious and very fast?

@dadoonet thanks for your quick response!

If I understand correctly, you are suggesting to calculate the distances from location and service_citites.location when indexing the document, storing the value, and sorting on that value at query time.

I am not sure how this will work though as I will be searching for a variety of locations across the US, so at index time, I will not know the distance between the searched location and the locations stored in my documents.

But in the example you mentioned, you have a location and service_cities.location. I thought you wanted to computed that distance...

Sorry for the confusion, I will try to clarify.

Basically, I will be searching my index with a users lat/lon. When I search my index, I want results to be sorted based on the location OR service_cities.location that is closest to the users lat/lon.

Does that make more sense?

It does.

Is there any priority you want to have on location vs service_cities.location or the closer, whatever the field is, is what you want?

No priority on location vs. service_cities.location, just the whatever is closer

I wonder if something like this would work for you?

DELETE test 
PUT test
{
  "mappings": {
    "properties": {
      "loc4sorting": {
        "type": "geo_point"
      },
      "location": {
        "type": "geo_point"
      },
      "service_cities": {
        "properties": {
          "location": {
            "type": "geo_point"
          }
        }
      }
    }
  }
}
PUT test/_doc/1
{
  "loc4sorting": [
    {
      "lat": 40.73061,
      "lon": -73.935242
    },
    {
      "lat": 39.742043,
      "lon": -104.991531
    }
  ],
  "location": {
    "lat": 40.73061,
    "lon": -73.935242
  },
  "service_cities": [
    {
      "location": {
        "lat": 39.742043,
        "lon": -104.991531
      }
    }
  ]
}
GET test/_search
{
  "sort": [
    {
      "_geo_distance": {
        "loc4sorting": [
          -70,
          40
        ],
        "unit": "km"
      }
    }
  ]
}

Basically, copy in a specific field all values you need to sort by.

Ah yes, that did end up working for me. Thanks for all your help!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.