How can I view ALL buckets in aggregations?

arisbanach · January 5, 2018, 4:00pm

For example, I have an aggregation of IP addresses and want to see the counts of them. However, I realized that it only shows however many you set show to. I need a list of every IP address and the total counts of each, but can't seem to find how to get it to include every IP address bucket. Am I going about this wrong?

Mark_Harwood · January 5, 2018, 4:47pm

Two options for high cardinality values like IP address that are too big to handle in one request:

Use the terms aggregation repeatedly for different partitions
Use the composite aggregation repeatedly with the after parameter

arisbanach · January 5, 2018, 5:18pm

Okay, I'm trying #1 and am wondering if there is a way for this to be set dynamically. We might get drastically varying numbers of IP addresses, so could the request just take the value of whatever the cardinality is and set the number of partitions and size based on that?

Mark_Harwood · January 5, 2018, 5:23pm

No - it's up to your client app to figure out the right number of partitions as per the linked doc using the cardinality agg.
The "right number of partitions" could vary wildly depending on the type of request e.g. if you were planning on nesting a date histogram under each IP to get a day-by-day summary of its activity.

arisbanach · January 5, 2018, 5:24pm

Also, from reading the docs on the cardinality agg:

Computing exact counts requires loading values into a hash set and returning its size. This doesn’t scale when working on high-cardinality sets and/or large values as the required memory usage and the need to communicate those per-shard sets between nodes would utilize too many resources of the cluster.

So that number won't be accurate for IP addresses since they're high cardinality?

Mark_Harwood · January 5, 2018, 5:26pm

More than likely good enough for figuring out how many partitions you'll need though.

arisbanach · January 5, 2018, 5:28pm

I'm just getting started learning scripted fields and Painless. Is there a way to programmatically do this using Painless in a query?

Mark_Harwood · January 5, 2018, 5:29pm

Sorry, no.
This is logic outside of elasticsearch you'll need to figure out how many queries to run

arisbanach · January 5, 2018, 5:30pm

Okay, thanks!

system · February 2, 2018, 5:30pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Dynamically assign aggregation partitionings Elasticsearch	6	991	April 10, 2018
Running cardinality for more than 10000 buckets Elasticsearch	14	2954	August 28, 2019
Cardinality aggregation: discrete shards Elasticsearch	1	339	July 5, 2017
Aggregation to take the first result for every unique value of a term Elasticsearch	4	5491	February 20, 2018
How to know the total number of aggregation result buckets Elasticsearch	9	819	May 9, 2019

How can I view ALL buckets in aggregations?

Related topics