Increase in response time for aggregations in 7.10.1

Hi,

We recently upgraded our ElasticSearch cluster from 7.7 to 7.10.1 and we are seeing an increase in the response time.
One query that uses aggregations has seen the response time increase by 1-1.5s.

Setup
* 3 Master nodes + 3 data nodes
* 2 primaries + 1 replica shards
* Master nodes: 16G
* Data nodes: 31G

Index

  • It is an index with 962994 documents
  • Each document contains around 200 fields.
  • No stored fields in the mapping

Expected query output

  • We are trying to group objects of two different types; courses and classes which have a 1:n relation.
  • There are 20k courses and 334k classes.

Changes

  • There are no changes in data or the query.
  • Confirmed that the only variable is ElasticSearch version.

Observations

Query tuning

  • Global Ordinals
    • Use eager_global_ordinals
    • Use execution hint "map"
  • Stored fields stored_fields : none
  • index.max_docvalue_fields decrease to 10

None of these seem to be having any effect.

Query
It is a bit long. Hence shared via Google Drive.

Profile output
It is a bit long. Hence shared via Google Drive.

I can see most of the time is taken in TopHitsAggregator. Not sure why the time has increased from ES 7.7 to ES 7.10.1.

References

What am I missing here ? Is there any setting that needs to be changed ?

Thanks in advance.

-Ravi

Hey,

indeed there have been some changes in aggregations. Are you able to upgrade to the latest 7.13. and check if your aggregation speed has been restored? If the problem still persists, that sounds like a book to investigate.

Also, taking a look at the search profiler might help.

--Alex

Hi Alex,

I tried the upgrade to 7.13.0 and still see the same issue with increased response time.

I narrowed the degradation down to a commit in which the codecs were changed from Lucene86Code to Lucene87Codec.
If I revert the codec (build ES with codec reverted), I see performance on par with ES 7.7.1/ES 7.9.3.

Any suggestions on why changes in compression would lead to a degradation ?

Thanks.

-RaviShekhar G

I experience the same issue with 7.10.2, the response time is 10x compare with 7.7.1 on the same hardware.

@rgopalan , would you mind share what you changed to make it work? I have the same issue, and when I try it, and I got the following error "failed because of NotSerializableExceptionWrapper[unsupported_operation_exception: Old codecs may only be used for reading]"

Hi Jock,

We built a custom ES version reverting the codec from Lucene87 to Lucene86.
Ultimately, we had to pause the upgrade because we did not want to maintain a custom version of ES.
We are investigating 7.13 where Lucene has been upgraded again and the compression is made configurable as per https://issues.apache.org/jira/browse/LUCENE-9378

Changes to revert the codec
/server/src/main/java/org/elasticsearch/bootstrap/Bootstrap.java - commented out checkLucene()
/server/src/main/java/org/elasticsearch/common/lucene/Lucene.java
/server/src/main/java/org/elasticsearch/index/codec/CodecService.java
/server/src/main/java/org/elasticsearch/index/codec/PerFieldMappingPostingFormatCodec.java
/server/src/main/java/org/elasticsearch/Version.java

Thanks a lot @rgopalan for sharing the detail information. It seems the issue is there since elasticsearch 7.7(Lucene 85), and may fix in elasticsearch 7.13(Lucene 88).

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.