Upgrade From 7.3.1 to 7.13.2 causing circuit breaker exceptions / slow performance

We upgraded from 7.3.1 to 7.13.2 using rolling upgrade and performance took a major hit.
For example, a search on 7.3.1 that might take 100-200ms is taking 5000ms in 7.13.2. We had to backout the upgrade so we are on 7.3.1 now and we are not having any performance issues.

We did this upgrade for no other reason than Veracode vurnerabilies in the rest client jars on 7.3.1.
We made 0 changes from our end, same cluster settings, same template, same shards per index, same jvm.options, same java version, nothing changed besides the version number.

It appears the main performance issue could be caused in aggregations because disabling them improves the search performance by 20x on 7.13.2. It appears the circuit breaker and garbage collection logs happen at peak user hours so i believe it is more with the searching than indexing.

Increasing memory did not help at all. We have the jvm.options to use 16gb per node, updating to 25gb did not help. It had the same problems.

We downgraded back to 7.3.1 on 16gb of memory and as of now things are looking good again so far.

Wondering if any ideas what change in elasticsearch between 7.3.1 to 7.13.2 might have caused our problems in the newer version. I went through the change log but nothing stood out to me.

UPDATE:

It appears the slowness/memory issues are due to elasticsearch's optimized aggregations i think were introduced in 7.13.0.

in 7.13.2 the setting to disable these optimizations came (Setting to disable rewrite-to-filters optimization · Issue #73426 · elastic/elasticsearch · GitHub) and we set it to false in our 7.13.2 cluster and so far we are good now.

There were changes in Lucene (which underlies Elastic) in 7.10+ that may have a negative impact on query performance: Query performance regres

Also see: https://issues.apache.org/jira/browse/LUCENE-9447

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.