If I have many network session status documents, which I would like to aggregate on a the five tuples (src_ip, src_port, dst_ip, dst_port, proto), how do I write the aggregation query?
Do I write a aggregations with 5 levels? Could someone point me to an similar example?
Potentially, yes. The usual procedure is to continue embedding aggregations inside each other to get each "layer" of the tuple. E.g. in this case, you'd probably use a bunch of terms aggregations.
Or you could use a single terms agg with a script field to combine all the values into a single string.
It sorta depends on what you want as the end result. Embedded aggs will give doc-counts at each level, as well as any metrics you specify. A single agg with concatenated tuple using script will give you one layer and only doc counts for that specific tuple. It'll also incur the overhead of executing the script, although with Painless it should be pretty small.
Do be careful with a high-cardinality tuples like that though (regardless of the approach you take), since the memory overhead of so many buckets could be expensive. You may want to use the partitions feature of the terms agg to "paginate" through the terms, so they don't all come back at once.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.