Aggregations taking way too long?

Hello,

I would like to have your opinion on these aggregations. Currently, these can take up to 5 mins to run on my ES cluster (7.17.1) for ~ 2 years of data (~ 1 billion of documents / 100GB data spread over 2 data nodes).

So I would like to know if there are things to optimize on this side and only on this side at first

GET test/_search
{
    "aggregations": {
        "story_id_filters": {
            "terms": {
                "field": "story_id",
                "size": 65536
            }
        },
        "byCategory:no_category": {
            "aggregations": {
                "author": {
                    "aggregations": {
                        "total_cost": {
                            "sum": {
                                "field": "cost.usd"
                            }
                        }
                    },
                    "terms": {
                        "field": "author",
                        "size": 65536
                    }
                },
                "total_cost": {
                    "sum": {
                        "field": "cost.usd"
                    }
                },
                "usage_period_start": {
                    "aggregations": {
                        "total_cost": {
                            "sum": {
                                "field": "cost.usd"
                            }
                        }
                    },
                    "date_histogram": {
                        "calendar_interval": "day",
                        "field": "usage_period_start"
                    }
                }
            },
            "filter": {
                "bool": {
                    "must_not": {
                        "nested": {
                            "path": "tags",
                            "query": {
                                "regexp": {
                                    "tags.key": {
                                        "value": ".*cat.*"
                                    }
                                }
                            }
                        }
                    }
                }
            }
        },
        "byCategory:category": {
            "aggregations": {
                "tag_keys": {
                    "aggregations": {
                        "tag_values": {
                            "aggregations": {
                                "reverse_aggr_cost": {
                                    "aggregations": {
                                        "author": {
                                            "aggregations": {
                                                "total_cost": {
                                                    "sum": {
                                                        "field": "cost.usd"
                                                    }
                                                }
                                            },
                                            "terms": {
                                                "field": "author",
                                                "size": 65536
                                            }
                                        },
                                        "total_cost": {
                                            "sum": {
                                                "field": "cost.usd"
                                            }
                                        },
                                        "usage_period_start": {
                                            "aggregations": {
                                                "total_cost": {
                                                    "sum": {
                                                        "field": "cost.usd"
                                                    }
                                                }
                                            },
                                            "date_histogram": {
                                                "calendar_interval": "day",
                                                "field": "usage_period_start"
                                            }
                                        }
                                    },
                                    "reverse_nested": {}
                                }
                            },
                            "terms": {
                                "field": "tags.value",
                                "size": 65536
                            }
                        }
                    },
                    "filter": {
                        "regexp": {
                            "tags.key": {
                                "value": ".*cat.*"
                            }
                        }
                    }
                }
            },
            "nested": {
                "path": "tags"
            }
        },
        "byAuthor:author": {
            "aggregations": {
                "total_cost": {
                    "sum": {
                        "field": "cost.usd"
                    }
                },
                "usage_period_start": {
                    "aggregations": {
                        "total_cost": {
                            "sum": {
                                "field": "cost.usd"
                            }
                        }
                    },
                    "date_histogram": {
                        "calendar_interval": "day",
                        "field": "usage_period_start"
                    }
                }
            },
            "terms": {
                "field": "author",
                "size": 65536
            }
        },
        "category_filters": {
            "aggregations": {
                "tag_keys": {
                    "aggregations": {
                        "tag_values": {
                            "terms": {
                                "field": "tags.value",
                                "size": 65536
                            }
                        }
                    },
                    "filter": {
                        "regexp": {
                            "tags.key": {
                                "value": ".*cat.*"
                            }
                        }
                    }
                }
            },
            "nested": {
                "path": "tags"
            }
        },
        "total_cost": {
            "sum": {
                "field": "cost.usd"
            }
        },
        "usage_story_id_filters": {
            "terms": {
                "field": "usage_story_id",
                "size": 65536
            }
        }
    },
    "query": {
        "bool": {
            "filter": [
                {
                    "range": {
                        "usage_period_start": {
                            "from": "2020-10-01T00:00:00Z",
                            "to": "2022-10-05T00:00:00Z",
                            "include_lower": true,
                            "include_upper": false
                        }
                    }
                },
                {
                    "terms": {
                        "author": [
                            "toto"
                        ]
                    }
                },
                {
                    "bool": {
                        "must_not": {
                            "bool": {
                                "filter": [
                                    {
                                        "terms": {
                                            "author": [
                                                "toto"
                                            ]
                                        }
                                    },
                                    {
                                        "terms": {
                                            "record_type": [
                                                "Tax",
                                                "StoryTotal"
                                            ]
                                        }
                                    }
                                ]
                            }
                        }
                    }
                },
                {
                    "bool": {
                        "must_not": {
                            "bool": {
                                "filter": [
                                    {
                                        "terms": {
                                            "author": [
                                                "tutu"
                                            ]
                                        }
                                    },
                                    {
                                        "terms": {
                                            "record_type": [
                                                "tax"
                                            ]
                                        }
                                    }
                                ]
                            }
                        }
                    }
                }
            ]
        }
    },
    "size": 0
}

I think going through the multi-search API. But is there a real added value?

Note: I voluntarily changed the name of some fields/values

Thanks a lot for your help :pray:

How many indices are is the data distributed across? How many primary shards does each index have? How large are the primary shards?

I see this in a few places and it looks slow and expensive. If this is something you often filter on and the substrings are known it would likely be better to change the document so that this can be done as an exact match if possible.

1 Like

Thanks a lot for your answer @Christian_Dahlqvist :pray:

So in total, there are 69 indices. Composed for each of a primary shard and a replica. The size of each primary shard varies between ~20GB and 100MB.

On the RAM side, we are also "OK", with 4GB of HEAP per data node. And for the moment it's underutilized.

On the CPU level, on the other hand, the cluster sometimes reaches 100% peaks (2vCPU per node), but by increasing the number of CPUs the cluster reach 40% peaks with more or less the same latency for the biggest requests.

It's indeed something that I filter often. However, the substring is not known in advance and the regex can also change completely.

Do you think there would be other areas for improvement?

How long does the aggregation take if you remove the regex filters?

Just to echo, regexp is super expensive. Depending on the number of values you have there, you might be better off listing them all individually in the filter, as it will be heaps faster.

1 Like

Hello @Christian_Dahlqvist & @warkolm and thanks for your answers!

Indeed there is something with regexp

With regexp : ~ 5mins wait
Without regexp: ~ 1min wait

But, even without regexp, is it normal to wait so much with all these aggregations/documents?

Note that the query without regex filters will aggregate across more data, so it is clear the filter has a major impact.

On top of that it seems you have quite a few small indices, which could also affect performance.

In the query you also have quite a few large size parameters, which could make the aggregation require more memory and be slower.

To troubleshoot further I would recommend looking at what limits performance. If CPU is maxed out while you are querying, which you indicated earlier, it may be that the cluster does not have enough resources. In addition to CPU, disk I/O can also often be a bottleneck, so it is worth also monitoring iowait.