Aggregation on terms and find min value of script results

Akihito_Kanbara · July 7, 2020, 7:51pm

Hi, I have a use case need to first groupBy "user" and do some calculation and find the min value of the results.
What I am doing is:

GET test/_search
{
  "size": 0,
  "aggs": {
    "throughput": {
      "terms": {
        "field": "user.keyword"
      },
      "aggs": {
        "sum_len": {
          "sum": {
            "field": "processed_length"
          }
        },
        "sum_pt": {
          "sum": {
            "field": "processed_duration"
          }
        },
        "throu": {
          "bucket_script": {
            "buckets_path": {
              "len": "sum_len",
              "pt": "sum_pt"
            },
            "script": "params.len / params.pt"
          }
        }
      }
    },
    "min_throu": {
      "min_bucket": {
        "buckets_path": "throughput>throu"
      }
    }
  }
}

It is working but I have a few questions:
I have about 100,000 users so the buckets number will be large. I need to set size = LARGE NUMBER in term aggregation. Is it safe to do it since I will need really a lot of buckets? Is there an alternate way to this job?

Another question is about Java REST Client, I can do filter_path to filter the response, but I did not find any way to do it with Java Client. Since I just care about the min value, I dont want the response carry redundent data which will make it slow. Is there any way I can reduce the size of response with Java client?

Akihito_Kanbara · July 8, 2020, 12:27am

Need help. Hope someone can share some ideas or hints.

nik9000 · July 8, 2020, 12:47am

I might a scripted metric under the terms and sorting the terms on that. I think that works. Scripted metric is always a bit fiddly and slow but it can push the math to the shards and stop you from having to pull everything back to the coordinating node.

Akihito_Kanbara · July 8, 2020, 12:54am

Thanks for reply. Are you suggesting use Scripted Metric Aggregation to directly do the calculation and for a term, here for a user. And then do Min Aggregation or Sorting based on that?
Can I have more details or some sample code?

system · August 5, 2020, 12:54am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Stringify result of terms aggregation Elasticsearch	2	600	January 11, 2019
Performing sum after terms aggregation for 1.5m buckets Elasticsearch	1	336	July 30, 2020
Sorting based on scripted metrics aggregation Elasticsearch	3	926	March 23, 2021
Access doc_count in scripted_metric section Elasticsearch	5	3855	July 5, 2017
Elastic search aggregations Elasticsearch	1	464	July 5, 2017

Aggregation on terms and find min value of script results

Related topics