For example, I have an aggregation of IP addresses and want to see the counts of them. However, I realized that it only shows however many you set show
to. I need a list of every IP address and the total counts of each, but can't seem to find how to get it to include every IP address bucket. Am I going about this wrong?
Two options for high cardinality values like IP address that are too big to handle in one request:
- Use the terms aggregation repeatedly for different partitions
- Use the composite aggregation repeatedly with the
after
parameter
Okay, I'm trying #1 and am wondering if there is a way for this to be set dynamically. We might get drastically varying numbers of IP addresses, so could the request just take the value of whatever the cardinality is and set the number of partitions and size based on that?
No - it's up to your client app to figure out the right number of partitions as per the linked doc using the cardinality
agg.
The "right number of partitions" could vary wildly depending on the type of request e.g. if you were planning on nesting a date histogram under each IP to get a day-by-day summary of its activity.
Also, from reading the docs on the cardinality
agg:
Computing exact counts requires loading values into a hash set and returning its size. This doesn’t scale when working on high-cardinality sets and/or large values as the required memory usage and the need to communicate those per-shard sets between nodes would utilize too many resources of the cluster.
So that number won't be accurate for IP addresses since they're high cardinality?
More than likely good enough for figuring out how many partitions you'll need though.
I'm just getting started learning scripted fields and Painless. Is there a way to programmatically do this using Painless in a query?
Sorry, no.
This is logic outside of elasticsearch you'll need to figure out how many queries to run
Okay, thanks!
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.