Dear Team,
We recently migrated from Elasticsearch 1.7.3 to Elasticsearch 8 (using version 8.12 for testing). Our application connects to Elasticsearch using the Java API client to search and fetch data.
During load testing after the migration, we noticed aggregation and statistics queries are taking longer compared to Elasticsearch 1.7.3. We compared response times using instrumented metrics from our application. We are unsure if the slower response is due to a bottleneck in our application code or the Elasticsearch cluster itself. The only changes in our application were moving to the Elasticsearch Java API client and required query changes.
For comparison, the queries that use aggregaion/statistics took < 100ms (95th Percentile) and after migration the response times are close to 300ms(95th percentile.). The average response times shot from around 30ms to 70ms.
Is there a way to monitor the time spent on aggregations within the Elasticsearch cluster? Currently we use the datadog interation for elasticsearch to monitor the cluster. Couldn't find any metric specific to aggregations. If there is an api call we should be able to integrate the metric. Any other suggestions to isolate the issue would also be helpful.
Here are some things we tried to improve performance:
- We tried setting "eager_global_ordinals" to true and also tried setting "execution_hint=map" as documented here
Please Let me know if you need any clarification or have suggestions on improving performance.
As another query we also noticed that Elasticsearch 8 requires larger coordinator nodes with more memory compared to 1.7.3. Is the coordinator node doing anything additional in Elasticsearch 8 versus 1.7.3?