Counting amount of buckets in aggregation

Hi!

I have an index holding connection events. I want to group them by 3 of their fields: src.ip, dst.ip and port.

I'm using composite aggregation for the grouping, but I also need the total amount of buckets.

Currently I'm using cardinality aggregation with a concatenation of the fields to calculate it, but lately the request began to receive timeouts (>30s) on larger datasets.

This is the query I'm using:

GET connections/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "timestamp": {
              "gt": "2024-07-24T00:00:00+00:00"
            }
          }
        },
        {
          "range": {
            "timestamp": {
              "lt": "2024-07-25T15:58:52+00:00"
            }
          }
        }
    }
  },
  "size": 0,
  "aggs": {
    "composite": {
      "composite": {
        "size": 21,
        "sources": [
          {
            "src.ip": {
              "terms": {
                "field": "src.ip.keyword",
                "order": "asc"
              }
            }
          },
          {
            "dst.ip": {
              "terms": {
                "field": "dst.ip.keyword",
                "order": "asc"
              }
            }
          },
          {
            "port": {
              "terms": {
                "field": "port",
                "order": "asc"
              }
            }
          }
        ]
      }
    },
    "composite_count": {
      "cardinality": {
        "script": {
          "lang": "painless",
          "source": "doc['src.ip.keyword'].value + '|' + doc['dst.ip.keyword'].value + '|' + doc['port'].value"
        }
      }
    }
  }
}

Is there a better way to perform this type of operation? I know I can insert the documents with a new field that is the concatenation of src.ip, dst.ip and port, but I would like to avoid that, as the grouped-by fields may not be known in advance (and there may be a lot of different combinations).

Thank you!

Here are two recommendation to tune the query speed for composite aggregations.

  1. You can use ingest pipeline to combine src.ip , dst.ip and port. (like you mentioned)
  2. index sorting - For optimal performance the index sort should be set on the index so that it matches parts or fully the source order in the composite aggregation. Composite aggregation | Elasticsearch Guide [8.14] | Elastic

Another recommendation: Do you know which aggregation take more time than others? You can check with Kibana Query profiler. Profile queries and aggregations | Kibana Guide [8.14] | Elastic

If the cardinality takes too much time you should focus on the first recommendation. If the composite aggregations takes most of the response time, index sorting will make you happy. :slight_smile:

Thank you for your answer!

The main problem is that the fields to aggregate and sort are unknown in advance. So using ingest pipeline will require me to insert all possible combinations. Same for index sorting - Since I don't know what fields will be used for aggregating and sorting I cant use this method.

Any other idea maybe?

Thank you :slight_smile: