Filter by ID without providing ID (turn 2 queries into one)

My problem can be distilled and exemplified by these 4 documents:

{
    "_id": 1,
    "owner": "A",
    "distance": 1,
    "age": 2
},
{
    "_id": 2,
    "owner": "A",
    "distance": 2,
    "age": 1
},
{
    "_id": 3,
    "owner": "B",
    "distance": 3,
    "age": 1
},
{
    "_id": 4,
    "owner": "B",
    "distance": 4,
    "age": 2
}

The requirements are:

  1. Return MAX 1 document pr. owner.
  2. Returned documents MUST be the one with the shortest distance for owner.
  3. Sorting keys are owner, distance, age or any combination of those.

This means we only one return document 1+3 sorted according to whatever key we choose.

The way we solve it today is to split the request in 2:

  • First request: Prefetch the IDs of the documents with the shortest distance.
  • Second request: Filter by IDs and sort according to chosen key.

We would like to avoid the latency of a second roundtrip and moving a lot of data (IDs) back and forth. And I'm wrecking my brain, looking into nested and parent/child and other ways to de-normalize the data and can't come up with a better solution.

NOTE: distance is dynamic i real world - it is the distance between a user and document location.

One thing that may work would be to make a match_all request sorted by distance with size: 0 and collapse on owner.

Then in aggs do the sorting.

Problem with this is we lose built in pagination and there is the overhead of aggregation that would have to be repeated for each page.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.