Fastest way to retrieve all unique terms in batches

I’m looking for the fastest and safest way to retrieve all unique values of a field for a given time range, in batches.

My requirements are:

  • :white_check_mark: Retrieve 100% of unique terms (no missing buckets)

  • :white_check_mark: Support batching / pagination

  • :white_check_mark: Be as fast as possible

  • :white_check_mark: Avoid excessive heap usage

I’ve evaluated two approaches:


1. terms aggregation with include.partition

Example:

"terms": {
  "field": "someField.keyword",
  "size": 10000,
  "include": {
    "partition": 0,
    "num_partitions": 20
  }
}

Iterating partition = 0..N.

I understand that:

  • Each unique term is deterministically hashed into a partition

  • Distribution may appear uneven for small cardinalities

  • Larger cardinalities should distribute more evenly

  • The same term always maps to the same partition across shards

However, this approach requires:

  • Choosing num_partitions upfront

  • Managing a hard size limit per partition (risk of missing terms)

  • Manual orchestration of partitions

  • No cursor/resume mechanism

  • Potentially higher heap usage due to in-memory bucket building


2. Composite aggregation with after_key

This seems to offer:

  • Cursor-based pagination

  • Unlimited buckets

  • Natural batching

  • Lower memory pressure

  • Easy resumability


Question

For the general use case:

Retrieve all unique field values over a time range, at scale, with batching and maximum performance

Is composite aggregation the recommended production approach over terms + partition?

Are there scenarios where terms + partition is preferable?

My primary goal is:

:backhand_index_pointing_right: Fast, complete, resumable extraction of unique terms.

Thanks in advance for any guidance.


Yes, use composite.

Partitioned terms agg is only generally useful if you are sorting by criteria other than the terms themselves eg getting a list of account IDS that have not been used for a long time (sorting by max date for each term, in reverse order)