Dynamically assign aggregation partitionings

Is there a way to dynamically set the number of partitioning we set for aggregation?

for example, if i have 10 aggs i preform, each one has different distinct count. can i dynamically set the partitioning to return X amount of distinct value?

The solution I thought of is preforming 2 requests for each agg, first a cardinality request to determine how many distinct values are, then divide that value with X and make a second request where i have cardinality result/X partitioning.
Im looking for a way to avoid the cardinality request.

Im aware that my solution is not perfect as more docs can be added after i made the request, and i accept that flaw

Thanks!

Bump

This is a phase you'd have to do in your client. It may be worth looking at the composite aggregation though

Thats exactly what i was looking for!
Ill test it right away.

One last thing.

My question origin was from performance issue, i had a query with 10 ish aggregations, which at first i wanted to get all the results, so as first solution i just put a size: 10K on it ( on each aggregation).
when the index got bigger it took some time to get the request back.

So i figured a way to paginate the aggregation with the partitioning.

From what i read from your composite link, i need to send the last document returned from the current request in my "next page" query, that means elastic will do the same query but give me the next results after the document i sent.( at least to my understanding)

What im trying to ask, Performance wise, will making the cardinality request on my server side, then partition the next query correctly to get the amount i want for a page be quicker than the composite method?

I doubt it but benchmarking will help confirm. In each case the query is streaming through the same set of matching docs and eliminating certain numbers of them by either

  1. retrieving a doc value, hashing it and then modulo N to see if it is part of the current partition or
  2. retrieving a doc value and comparing it to a high-watermark passed by the client to see if it should be added to the priority queue for the next set of results.

I was trying to test the composite method.

But i couldnt find documentation that i can do a composite aggregation on a nested document, is it possible?

thats the query with partitioning i want to convert to the composite one

#partionining
GET new_mappings/shahar_relax/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match_all": {}
        }
      ]
    }
  },
  "aggs": {
    "identifiers": {
      "nested": {
        "path": "identifiers"
      },
      "aggs": {
        "identifierType": {
          "terms": {
            "field": "identifiers.identifierType",
            "size": 10000,
            "include": {
              "partition": 0,
              "num_partitions": 10
            }
          },
          "aggs": {
            "identifierValue": {
              "terms": {
                "field": "identifiers.identifierValue",
                "size": 10000,
                "include": {
                  "partition": 0,
                  "num_partitions": 10
                }
              }
            },
            "identifierValueDistinctCount": {
              "cardinality": {
                "field": "identifiers.identifierValue",
                "precision_threshold": 40000
              }
            }
          }
        }
      }
    }
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.