How to do a distance sort over entities which each have multiple locations

Your product is simple-minded when it comes to distance searches.

We have organizations who have multiple locations each. When a user does a search, we MUST use the closest location for each org to that user. Your product only uses the last location in the list!!

Kimchy's response on Geo point distance with multiple locations in array - #2 by sakaraut is hopelessly naive. It is a very important use case. In fact, I would say it's the normal real-world use case.

I have tried several approaches in both Python space and ES space, but nothing works.

How do we search over the optimal (nearest) set of locations when each entity has multiple locations?

It would probably help if you could provide an example of what your data looks like and how it is mapped together with a detailed example of what you are looking to achieve. If you also can outline what you have tried so far and why that did not work that may also help clarify the problem and the requirements.

Which version of Elasticsearch are you using?

Thanks for the reply! Here's a better description.

In order to find the org nearest our user, we need to search over the set of locations comprised of each org's nearest location. If I have a potential org with offices 5 miles away, 25 miles away and 15 miles away, I should only be concerned about the one 5 miles away. distance_feature does not support that case. It only takes into account one location for each org, the one that is stored last in the index. If stored as I listed them above, this would be the 15-mile location.

Distance is dependent on each user’s location and on each org location, so we can never order the org location storage by distance. The distance will change for every user at runtime.

Put another way, if we have orgs stored in the order below with distances from a given user:

A: 5 mi, 25 mi, 15 mi
B: 10 mi
C: 2 mi, 30 mi
D: 14 mi, 16 mi, 11 mi

We should order them FOR THIS USER like this
C: 2, A: 5, B: 10, D: 11

With distance_feature query, we order them like this

B: 10, D: 11, A: 15, C: 30

And our front end finds the closest distance (we use a painless script to calculate each distance and send those out as results) for each provider, so we display them like this

B: 10, D: 11, A: 5, C: 2

Note that the documentation never addresses the case of having a list of geo_points for each item.

I have tried painless script_fields, but they cannot be used in the search query, only indexed fields can. Also, they cannot be used in another painless script, so I can't solve the problem piecewise.

I would try runtime_mapping, also called runtime_fields in some mentions, but unfortunately my management moved from ES to OS and it doesn't have that.

I have tried sorting in Python after the fact, but then that clobbers some of the other sorting we do inside the query.

Thanks for your time.

It sounds like you have one document per organisation and each document has a geo point field with an array of values. Is that correct?

One common approach when working with Elasticsearch is to index data multiple ways to optimise for different types of queries. Have you considered or tried indexing locations in a separate index? As each location would only be associated with one geo point it should sort just fine. If you need the full organisation document you could get that with 2 simple queries.

Another option may be to use a nested mapping as each document behind the scenes would have only one geo point. This would require you to use a nested query but could possibly solve the problem.

I do not use geo queries a lot, but someone else may know whether it could be solved in a different way - the post you linked to is after all over 12 years old and a lot has happened since.

Not exactly. We have multiple location documents per organization.

Each location has one geo point (we call it position).

In distance_feature, the field argument is just
"location.position"

I'd like that to mean "location[i*].position", where i* is the nearest location.
distance_feature silently and undocumentedly interprets that as location[-1].position

If I could chain painless scripts, I could find i* in one and then find
location[i*].position in the next.

The user initiates the query with her position (geo_point).

We want to find the closest org to the user.

Closest org really means closest org given the optimal (closest) location to this particular user.

We cannot do this at index time.

The user position is unknown until query time, so distance sorting HAS to happen in real time.

The nested mapping might work.

This is why I initially asked for a sample document and mappings. I have seen a lot of users simplify their use case and leave out important assumptions and constraints just to surface these when solutions are suggested that do not meet the requirements. Wastes a lot of time...

How many positions can a location have? How often are these added, deleted or updated? If this relates to something that is moving around, does each position have an expiry time?

Sorry.

An organization can have an arbitrary number of locations. In practice I've seen over 30. Some have just one, but most have 2-5.

The locations do not move around, the user does.

The position we are finding a distance against is the position of a user. That can be anywhere, and is unknown until run time.

So the organization.locations are not changing, but the distance from the user changes every time and is unknown. That's why I say it has to be solved in the query, not when we load the data into the index.

Also sorting after the fact in our Python DSL might break other sorting and filtering we do. So it really needs to be done by the search engine at query time.

Do you have one document per organisation and each organisation document has an array of locations (where each location has a single position)?

Yes. But the distance varies for each user at query time.

In that case I would either try nested mappings for locations or simply index each location as a separate document (with organisation data denormalised onto each location) in a separate index.

Thank you very much, Christian.

I still don't understand why separately indexing the Locations would make a difference.

I'm not sure, but it seems like the nesting might work.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.