Counting amount of buckets in aggregation

Zful · July 25, 2024, 1:28pm

Hi!

I have an index holding connection events. I want to group them by 3 of their fields: src.ip, dst.ip and port.

I'm using composite aggregation for the grouping, but I also need the total amount of buckets.

Currently I'm using cardinality aggregation with a concatenation of the fields to calculate it, but lately the request began to receive timeouts (>30s) on larger datasets.

This is the query I'm using:

GET connections/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "timestamp": {
              "gt": "2024-07-24T00:00:00+00:00"
            }
          }
        },
        {
          "range": {
            "timestamp": {
              "lt": "2024-07-25T15:58:52+00:00"
            }
          }
        }
    }
  },
  "size": 0,
  "aggs": {
    "composite": {
      "composite": {
        "size": 21,
        "sources": [
          {
            "src.ip": {
              "terms": {
                "field": "src.ip.keyword",
                "order": "asc"
              }
            }
          },
          {
            "dst.ip": {
              "terms": {
                "field": "dst.ip.keyword",
                "order": "asc"
              }
            }
          },
          {
            "port": {
              "terms": {
                "field": "port",
                "order": "asc"
              }
            }
          }
        ]
      }
    },
    "composite_count": {
      "cardinality": {
        "script": {
          "lang": "painless",
          "source": "doc['src.ip.keyword'].value + '|' + doc['dst.ip.keyword'].value + '|' + doc['port'].value"
        }
      }
    }
  }
}

Is there a better way to perform this type of operation? I know I can insert the documents with a new field that is the concatenation of src.ip, dst.ip and port, but I would like to avoid that, as the grouped-by fields may not be known in advance (and there may be a lot of different combinations).

Thank you!

Musab_Dogan · July 28, 2024, 11:10pm

Here are two recommendation to tune the query speed for composite aggregations.

You can use ingest pipeline to combine src.ip , dst.ip and port. (like you mentioned)
index sorting - For optimal performance the index sort should be set on the index so that it matches parts or fully the source order in the composite aggregation. Composite aggregation | Elasticsearch Guide [8.14] | Elastic

Another recommendation: Do you know which aggregation take more time than others? You can check with Kibana Query profiler. Profile queries and aggregations | Kibana Guide [8.14] | Elastic

If the cardinality takes too much time you should focus on the first recommendation. If the composite aggregations takes most of the response time, index sorting will make you happy.

Zful · August 1, 2024, 1:00pm

Thank you for your answer!

The main problem is that the fields to aggregate and sort are unknown in advance. So using ingest pipeline will require me to insert all possible combinations. Same for index sorting - Since I don't know what fields will be used for aggregating and sorting I cant use this method.

Any other idea maybe?

Thank you

Topic		Replies	Views
Composite aggregation buckets total count Elasticsearch	2	4180	July 24, 2018
Can't figure out how to get bucket count for an aggregation Elasticsearch	2	470	February 7, 2018
Need help with Terms Aggregation : buckets count Elasticsearch	1	325	May 12, 2020
Composite aggregation total bucket count after bucket selector Elasticsearch	1	51	December 3, 2024
Count buckets in aggregation? Elasticsearch	1	486	July 5, 2017

Counting amount of buckets in aggregation

Related topics