Dynamically assign aggregation partitionings

sharongur · March 1, 2018, 2:01pm

Is there a way to dynamically set the number of partitioning we set for aggregation?

for example, if i have 10 aggs i preform, each one has different distinct count. can i dynamically set the partitioning to return X amount of distinct value?

The solution I thought of is preforming 2 requests for each agg, first a cardinality request to determine how many distinct values are, then divide that value with X and make a second request where i have cardinality result/X partitioning.
Im looking for a way to avoid the cardinality request.

Im aware that my solution is not perfect as more docs can be added after i made the request, and i accept that flaw

Thanks!

sharongur · March 13, 2018, 8:56am

Bump

Mark_Harwood · March 13, 2018, 11:40am

This is a phase you'd have to do in your client. It may be worth looking at the composite aggregation though

sharongur · March 13, 2018, 12:37pm

Thats exactly what i was looking for!
Ill test it right away.

One last thing.

My question origin was from performance issue, i had a query with 10 ish aggregations, which at first i wanted to get all the results, so as first solution i just put a size: 10K on it ( on each aggregation).
when the index got bigger it took some time to get the request back.

So i figured a way to paginate the aggregation with the partitioning.

From what i read from your composite link, i need to send the last document returned from the current request in my "next page" query, that means elastic will do the same query but give me the next results after the document i sent.( at least to my understanding)

What im trying to ask, Performance wise, will making the cardinality request on my server side, then partition the next query correctly to get the amount i want for a page be quicker than the composite method?

Mark_Harwood · March 13, 2018, 1:08pm

I doubt it but benchmarking will help confirm. In each case the query is streaming through the same set of matching docs and eliminating certain numbers of them by either

retrieving a doc value, hashing it and then modulo N to see if it is part of the current partition or
retrieving a doc value and comparing it to a high-watermark passed by the client to see if it should be added to the priority queue for the next set of results.

sharongur · March 13, 2018, 1:27pm

I was trying to test the composite method.

But i couldnt find documentation that i can do a composite aggregation on a nested document, is it possible?

thats the query with partitioning i want to convert to the composite one

#partionining
GET new_mappings/shahar_relax/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match_all": {}
        }
      ]
    }
  },
  "aggs": {
    "identifiers": {
      "nested": {
        "path": "identifiers"
      },
      "aggs": {
        "identifierType": {
          "terms": {
            "field": "identifiers.identifierType",
            "size": 10000,
            "include": {
              "partition": 0,
              "num_partitions": 10
            }
          },
          "aggs": {
            "identifierValue": {
              "terms": {
                "field": "identifiers.identifierValue",
                "size": 10000,
                "include": {
                  "partition": 0,
                  "num_partitions": 10
                }
              }
            },
            "identifierValueDistinctCount": {
              "cardinality": {
                "field": "identifiers.identifierValue",
                "precision_threshold": 40000
              }
            }
          }
        }
      }
    }
  }
}

system · April 10, 2018, 1:28pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Slice the aggregation Elasticsearch	2	861	February 16, 2018
How can I view ALL buckets in aggregations? Elasticsearch	9	793	February 2, 2018
Running cardinality for more than 10000 buckets Elasticsearch	14	2867	August 28, 2019
During terms aggregation with partition we get sum_other_doc_count > 0 in between partitions Elasticsearch	4	330	January 10, 2023
Get all distincts values of a field ( more than 10k values) Elasticsearch	4	542	September 13, 2019

Dynamically assign aggregation partitionings

Related topics