Hi!
I have an index holding connection events. I want to group them by 3 of their fields: src.ip
, dst.ip
and port
.
I'm using composite aggregation for the grouping, but I also need the total amount of buckets.
Currently I'm using cardinality aggregation with a concatenation of the fields to calculate it, but lately the request began to receive timeouts (>30s) on larger datasets.
This is the query I'm using:
GET connections/_search
{
"query": {
"bool": {
"filter": [
{
"range": {
"timestamp": {
"gt": "2024-07-24T00:00:00+00:00"
}
}
},
{
"range": {
"timestamp": {
"lt": "2024-07-25T15:58:52+00:00"
}
}
}
}
},
"size": 0,
"aggs": {
"composite": {
"composite": {
"size": 21,
"sources": [
{
"src.ip": {
"terms": {
"field": "src.ip.keyword",
"order": "asc"
}
}
},
{
"dst.ip": {
"terms": {
"field": "dst.ip.keyword",
"order": "asc"
}
}
},
{
"port": {
"terms": {
"field": "port",
"order": "asc"
}
}
}
]
}
},
"composite_count": {
"cardinality": {
"script": {
"lang": "painless",
"source": "doc['src.ip.keyword'].value + '|' + doc['dst.ip.keyword'].value + '|' + doc['port'].value"
}
}
}
}
}
Is there a better way to perform this type of operation? I know I can insert the documents with a new field that is the concatenation of src.ip
, dst.ip
and port
, but I would like to avoid that, as the grouped-by fields may not be known in advance (and there may be a lot of different combinations).
Thank you!