Hi,
We recently upgraded our ElasticSearch cluster from 7.7 to 7.10.1 and we are seeing an increase in the response time.
One query that uses aggregations has seen the response time increase by 1-1.5s.
Setup
* 3 Master nodes + 3 data nodes
* 2 primaries + 1 replica shards
* Master nodes: 16G
* Data nodes: 31G
Index
- It is an index with 962994 documents
- Each document contains around 200 fields.
- No stored fields in the mapping
Expected query output
- We are trying to group objects of two different types; courses and classes which have a 1:n relation.
- There are 20k courses and 334k classes.
Changes
- There are no changes in data or the query.
- Confirmed that the only variable is ElasticSearch version.
Observations
- Queries using aggregations are the ones that have seen an increase in response time.
- We have seen this increase in response time with ES 7.10.1 and ES 7.10.2
- ES 7.9.3 is showing times comparable to ES 7.7.
- Seems to be related to the Lucene upgrade as mentioned in Query performance regression after upgrade to 7.10.x with huge values for "size" field · Issue #67574 · elastic/elasticsearch · GitHub. Tried the resolution mentioned in the issue, but it had no effect.
Query tuning
- Global Ordinals
- Use eager_global_ordinals
- Use execution hint "map"
- Stored fields stored_fields : none
- index.max_docvalue_fields decrease to 10
None of these seem to be having any effect.
Query
It is a bit long. Hence shared via Google Drive.
Profile output
It is a bit long. Hence shared via Google Drive.
I can see most of the time is taken in TopHitsAggregator. Not sure why the time has increased from ES 7.7 to ES 7.10.1.
References
- Improving the performance of high-cardinality terms aggregations in Elasticsearch | Elastic Blog
- eager_global_ordinals | Elasticsearch Guide [master] | Elastic
What am I missing here ? Is there any setting that needs to be changed ?
Thanks in advance.
-Ravi