We have been running Elasticsearch 7.10.2 for a while in a production setting now, serving thousands of requests per second across 10 separate clusters. Our clusters are all in AWS on ec2 hosts. A coupe of months back we upgraded from 7.10.2 to 7.12.1 and noticed that the data node query latency (as measured by the datadog integration using metrics
elasticsearch.search.query.time divided by
elasticsearch.search.query.total) increased significantly -- from ~ 7ms to ~ 13ms. That's a 90% increase. Not understanding what was going on, we moved back to 7.10.2.
The hardware is unchanged between version upgrades. Our data nodes are of ec2 instance type i3.8xlarge with 4 locally attached 1.8TB NVMe devices. Each node is running 4 ES instances with 28GB heap. We are using the JVM that comes embedded with the official Elasticsearch debian package.
Yesterday we built a new cluster using 7.13.4 and replayed traffic to it to compare. Same observations: Query latencies almost double that of 7.10.2.
I've combed through changelogs to find anything that could explain this behavior but cannot. Has anyone else seen the same ?