We have been running Elasticsearch 7.10.2 for a while in a production setting now, serving thousands of requests per second across 10 separate clusters. Our clusters are all in AWS on ec2 hosts. A coupe of months back we upgraded from 7.10.2 to 7.12.1 and noticed that the data node query latency (as measured by the datadog integration using metrics elasticsearch.search.query.time
divided by elasticsearch.search.query.total
) increased significantly -- from ~ 7ms to ~ 13ms. That's a 90% increase. Not understanding what was going on, we moved back to 7.10.2.
The hardware is unchanged between version upgrades. Our data nodes are of ec2 instance type i3.8xlarge with 4 locally attached 1.8TB NVMe devices. Each node is running 4 ES instances with 28GB heap. We are using the JVM that comes embedded with the official Elasticsearch debian package.
Yesterday we built a new cluster using 7.13.4 and replayed traffic to it to compare. Same observations: Query latencies almost double that of 7.10.2.
I've combed through changelogs to find anything that could explain this behavior but cannot. Has anyone else seen the same ?