Nested Aggregations are 5~10x times slower in ES 6.x than 5.6.x

cmadera_rp · May 11, 2018, 3:32pm

I've prepare a test environment to try to find a way to fix this.

I have one machine with 16 cores and 64gb ram, with ES 5.6.8 and ES 6.2.4, each instances of ES have a XMX/XMS in 30 Gb

have only one index with 35.808.600 docs, 5 shards, codec: best_compression, _source: true, no stored_field's in both ES versions.

The Pri.Store.Size
in ES 5.6.8: 18,48 Gb
in ES 6.2.4: 12,20 Gb
why is this difference?

When perform the same aggregation in
ES 5.6.8 took: 358~530 ms
ES 6.2.4 took: 5200~12600 ms

why this happens?? what change in the mayor version than degrade the performance in this way??

i've compare all settings of ES cluster and index, and all are basically the same (using include_defaults=true to got defaults too)

the aggregation query is:

{
  "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "must": [
            { "range": { "timestamp_utc": { "gt": "now-30d" } } },
            { "terms": { "element_id": [
                  "68C894", "BE6053",  ......... UNTIL TO 1000 ELEMENTS
                ] } } ] } } } },
  "size": 0,
  "aggs": { "by_element": { "terms": { "field": "element_id", "size": 999999 },
    "aggs": { "by_topic": { "terms": { "field": "topic", "size": 999999 },
      "aggs": { "by_group": { "terms": { "field": "group", "size": 999999 },
        "aggs": { "by_type": { "terms": { "field": "type", "size": 999999 },
          "aggs": { "by_sub_type": { "terms": { "field": "sub_type", "size": 999999, "missing": "N/A" },
            "aggs": { "by_position": { "terms": { "field": "position_name", "missing": "N/A", "size": 999999 },
              "aggs": { "by_position_id": { "terms": { "field": "position_id", "missing": "N/A", "size": 999999 },
                "aggs": { "sent_sub_type": { "sum": { "field": "event_score" } } }
              } }
            } }
          } }
        } }
      } }
    } }
  } }	
}

Thanks by advance

cmadera_rp · May 16, 2018, 9:52am

@thiago @Mark_Harwood @dadoonet @colings86 @mvg @jpountz guys some help here please.

thiago · May 16, 2018, 3:19pm

Read this and specifically the "Also be patient" part.

It's fine to answer on your own thread after 2 or 3 days (not including weekends) if you don't have an answer.

Please don't ping directly people in your thread if they are not participated yet to the discussion

thiago · May 17, 2018, 2:32am

Are you running both ES nodes with 30GB heap set on a single machine with 64GB ram?

cmadera_rp · May 17, 2018, 9:48am

sorry about that. but i've see oldest post related to aggregations without any response in the past.

cmadera_rp · May 17, 2018, 9:49am

yes currently i'm testing in one physical instance with this specs, but i've have two cluster even, and happen the same than i've described before.

thiago · May 17, 2018, 12:01pm

Ok, so regarding the disk space. It is expected that 6.x uses less storage since it ships with Lucene 7 that handles sparse indices much better. See https://www.elastic.co/blog/minimize-index-storage-size-elasticsearch-6-0

About the long time responses, your configuration will always provide very bad and unpredictable performance. Great part of Elasticsearch performance relies on OS-level filesystem cache. By running 2 JVM with 30GB on a system with 64GB RAM there won't be enough memory left for caching and Elasticsearch performance is unpredictable with such environment. See https://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html#_give_less_than_half_your_memory_to_lucene

cmadera_rp · May 17, 2018, 12:49pm

I've not run both elasticsearch versions at the same time.

in other way I have two cluster with 6 machines with NVMEs volumes, 64Gb of ram and 16 Cores, and happend the same.

i've tested from 6.0.0 to 6.2.4 going through all microversions to see if the performance drops occurs in a specific version and for my surprise in all 6.x version happens the same, this no occurs in any 5.x version.

this is related to a specific change between 5.x and 6.x and i don't know what? not a hardware or OS configuration.

thiago · May 17, 2018, 12:56pm

Since you are using a fairly complex query there, it may be related to how many segments the index has. You could try running POST /<index_name>/_forcemerge?max_num_segments=1 and repeat the query to see if it's any better (depending on the index size the forcemerge operation may take a while).

If that stills does not cut it, then I suggest that you install x-pack and analyze the query performance using the Search Profiler in Kibana.

cmadera_rp · May 22, 2018, 8:56am

Thanks thiago, this helps a lot, the times was reduced from 5~12 sec to 1,5~3 sec, i have a doub, when i've reindex some indexes into one, the data is not merged by default?

how we can able in index time to remain the max_num_segments=1? its possible?

what other things i can do to reach the same response time in 6.x (like the 5.x)

thanks by advance

thiago · May 23, 2018, 4:54am

Elasticsearch will keep merging the index the background while there is data being indexed. To understand better what happens check the awesome Mike McCandless blog about it. The core issue here is not that it's not merging, but it seems that too many tiny segments are being created (apparently). Are you calling the refresh API externally/manually? Also, what's the refresh interval of the index?

That is not possible due to how merging happens. It can only reach a single segment by calling the API.

At this point the best is running the query against the Search Profiler to start investigating further for more potential bottlenecks.

cmadera_rp · May 25, 2018, 9:34am

I've found the possible cause of the problems related with this nested aggregations, specifically focused in the jump to the mayor version 6.x

from where i can download the ES 6.3.x from the url in the documentation is broken
https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.3.0.tar.gz

in that version the problem is fixed

cmadera_rp · June 18, 2018, 2:48pm

this problem was fixed in this release? (6.3.0)

system · July 16, 2018, 2:48pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Aggregations slower in ElasticSearch 6.3.0 Elasticsearch	1	380	August 15, 2018
Aggregation response time is slow in ES Version 6.4 Elasticsearch	3	409	February 18, 2019
Reducing heap size increases query speed Elasticsearch	7	273	September 9, 2022
Elasticsearch 5 vs 8 performance and index size Elasticsearch	10	2347	March 16, 2023
Elasticsearch terms aggregation taking 5 seconds on 5 million documents Elasticsearch	7	2015	August 19, 2019

Nested Aggregations are 5~10x times slower in ES 6.x than 5.6.x

Related topics