Pagination + Sorted Aggregations: Efficiently Retrieve Sorted List of Values?

openelasticsearch · July 26, 2023, 2:54pm

Hi, I'm looking for some advice on the best way to implement an aggregation query that supports pagination and sorting.

Quick Overview of My Documents & Desired Use Case:
I have indexes that contain documents with a number of relevant fields

{ user_id: 5, timestamp: 123456789, ... other fields ... },
{ user_id: 6, timestamp: 123456791, ... other fields ...  }
{ user_id: 8, timestamp: 123456811, ... other fields ...  }
{ user_id: 5, timestamp: 1234567891, ... other fields ...  }

I'm trying to implement an aggregation query that will return a list of user_ids sorted by most recent timestamp. Ideally, this aggregation will support pagination as user_id is a high cardinality field and there may be many documents that share the same user_id. I want to avoid pulling all of the results at once due to speed and memory concerns.

For Example: Let's say I only want 1 user_id per page of results & use the example documents I provided above.
Page 1 -> user_id = 5
Page 2 -> user_id = 8
Page 3 -> user_id = 6

What I've Tried so Far

Term/s Aggregation
Composite Aggregation

Term/s aggregations have good support for sorting the result buckets & avoid duplicate user_id results by grouping all documents containing the same user_id into a bucket.

"users": {
      "terms": {
        "field": "user_id",
        "order": { "user_latest_timestamp.max": "desc" },
        "size": 200
      },
      "aggs": {
        "user_latest_timestamp": {
          "stats": { "field": "timestamp" }
        }
    }
}

The problem is that as far as I can tell, terms aggregations do not support pagination. If the index I'm running this query against contains 5,000+ unique user_ids, then size must be set to some arbitrary value (5,000). The outcome is a slow query that returns 100s of KBs of data.

Composite aggregations are obviously the way to go for paginating aggregation responses, but I can't seem to get the sorting to work for my use case.

"users": {
    "composite": {
        "size": 200,
        "sources": [
            { "user_ids": { "terms": { "field": "user_id" } } }
        ]

The first composite aggregation I tried only supports sorting the results by the values of sources (i.e. the user_id values).

"users": {
    "composite": {
        "size": 200,
        "sources": [
            { "time": { "terms": { "field": "timestamp", "order": "desc" } } },
            { "user_ids": { "terms": { "field": "user_id" } } }
        ]

I've also attempted to include the timestamp in sources, but I don't think this is appropriate either. This does produce a sorted list of results in each query, but my understanding is that due to the nature of composite aggregations not looking at every document, the results are simply sorted locally (i.e. the results are sorted, but that doesn't mean the first bucket I receive for the first page actually contains the most recent timestamp in the entire index). Additionally, I will receive "duplicate" user_id results. Each unique timestamp + user_id pair will appear in my composite buckets response, meaning I'd need to keep track of previously displayed user_ids in my application.

Is there any way to sort my aggregation results by timestamp and support pagination without maintaining some state in my application? I want to avoid querying every single user_id + their max timestamp and sorting in the application.

Any advice would be greatly appreciated.

Mark_Harwood1 · July 26, 2023, 3:15pm

Reasoning about “last known state” of entities is a classic scenario for an entity-centric index maintained using the “transforms” api.

system · August 23, 2023, 3:15pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Pagination, sorted and aggregation on data Elasticsearch	5	1240	November 28, 2022
Sorting search results into buckets, and then sorting again Elasticsearch	2	519	February 22, 2018
Sorted Pagination on Composite Aggregation Elasticsearch	2	468	July 27, 2022
SORTING, SUM, PAGINATION, and AGGREGATION all in one! Elasticsearch	1	383	January 5, 2022
Aggregation Pagination Elasticsearch	1	442	August 20, 2021

Pagination + Sorted Aggregations: Efficiently Retrieve Sorted List of Values?

Related topics