Massive performance drop for certain queries in ES9 compared to ES8

We discovered a massive performance drop after migration to ES 9 for certain queries with aggregations.

Our mapping an be seen here: INGe/inge_es_connector/src/main/resources/es_index_items.json at spring6 · MPDL/INGe · GitHub

The query is the following:

{
    "query": {
        "bool": {
            "must_not": [
                {
                    "term": {
                        "files.storage": "INTERNAL_MANAGED"
                    }
                }
            ]
        }
    },
    "size":1,
    "aggs": {
        "creatorsAgg": {
            "aggs": {
                "by_id": {

                    "terms": {
                        "field": "metadata.creators.person.identifier.id"
                    },
                    
                    "aggs": {
                        "creator_info": {
                            "top_hits": {
                                "_source": {
                                    "includes": [
                                        "metadata.creators.person.givenName",
                                        "metadata.creators.person.familyName",
                                        "metadata.creators.person.identifier.id"
                                    ]
                                },
                                "size": 1
                            }
                        }
                    }
}
                
            },
            "nested": {
                "path": "metadata.creators"
            }
        }
    }
}

We have about 12 Mio documents in this index, the complete size is around 13 GB:
green open items EadRYtv3TT6jAsaqIrvTXw 1 0 11528318 0 12.7gb 12.7gb 12.7gb

So I setup two local ES servers on the same machine with the exact same settings, added the index with the settings and mapping from the link above, reindexed the same data in both indexes, and posted the query with the following results:

ES 8.19.7:
first POST: 1236ms
subsequent: ~150ms

ES 9.2.1
first POST: 27997ms
subsequent POST: ~27000ms

If one of the following changes is made, it is much faste on ES9

  • set size to 0 (3 ms instead of 27000!!)
  • don't use nested aggregation
  • modify the query (e.g. "must" instead of "must_not", although the number of hits is quite similar)

Any idea or explanation is much appreciated. Thank you.

Is this the first query and then subsequent queries or something else?

How many times do you run the query during a test?

I would put them in containers so they each have their own page cache and resources. If they are just separate processes on the same host they will likely affect each others access to the page cache. How much RAM does the machine have and what have you got the heap sizes set to?

Is this for the first query as well as subsequent ones?

What do the numbers look like for both versions in this scenario?

What type of storage are you using?

Yes, it is the first query after a start of ES, and then the same query shortly afterwards. I run them about 10 times, the "took" numbers are always very similar.

I run them on my local machine (Apple M3 Max), but not in parallel. I started ES8, did the tests, stopped it, and then started ES9 and run the same tests. I applied a heap size of only 4GB for the tests (Xmx=4g, Xms=4g).
But I also tried the same on Debian VMs in our computing centre, I get the same extreme differences there too, with different numbers as there is different hardware. On the VM, ES 9 takes about 2 minutes (!) with this query first und subequent), ES 8 about 4s for the first, and 500ms for subsequent ones. Still not the best, but so much faster on ES8.

Made a new test with size=0, first query and subsequent queries

  • ES9: 23980ms, 25ms, 7ms, 6ms, 6ms
  • ES8: 398ms, 20ms, 7ms, 6ms, 6ms, 6ms

with size=1:

  • ES9: 27300ms, 26851ms, 26864ms, 27024ms, 27833ms, 27483ms
  • ES8: 353ms, 170ms, 177ms, 178ms, 161ms, 177ms

Data is stored on the internal SSD of my MacBook, so I guess it couldn't be mach faster.

Edit: If you're interested, I could share my elasticsearch data directory (or an index snapshot) with you. I created an index much smaller (~5GB), which shows the same effect and does not contain sensitive data)

One thing that I think it will help here is to capture the hot threads while running the query. That should tell us where Elasticsearch is spending most of the time.

1 Like

You can also try to obtain detailed timing information: Profile search requests. (But don’t capture hot threads and detailed timings at the same time for the same query, since profiling adds significant overhead.)

1 Like

Thanks for the suggestions.

I first captured hot_threads. Here is the response on ES9 while running the query:

::: {myhostname}{_AX4jIRPS1uknpFQ6MoXQA}{LbwweeGFQOWcrjrTI9JVAA}{myhostname}{127.0.0.1}{127.0.0.1:9300}{cdfhilmrstw}{9.2.1}{8000099-9039001}{ml.machine_memory=38654705664, transform.config_version=10.0.0, xpack.installed=true, ml.config_version=12.0.0, ml.max_jvm_size=4294967296, ml.allocated_processors_double=14.0, ml.allocated_processors=14}
   Hot threads at 2025-11-12T15:29:55.537Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
   101.5% [cpu=101.5%, other=0.0%] (507.3ms out of 500ms) cpu usage by thread 'elasticsearch[myhostname][search][T#9]'
     10/10 snapshots sharing following 39 elements
       app/org.elasticsearch.server@9.2.1/org.elasticsearch.search.aggregations.metrics.TopHitsAggregator$1.setScorer(TopHitsAggregator.java:125)
       app/org.elasticsearch.server@9.2.1/org.elasticsearch.search.aggregations.LeafBucketCollectorBase.setScorer(LeafBucketCollectorBase.java:42)
       app/org.elasticsearch.server@9.2.1/org.elasticsearch.search.aggregations.LeafBucketCollectorBase.setScorer(LeafBucketCollectorBase.java:42)
       app/org.elasticsearch.server@9.2.1/org.elasticsearch.search.query.QueryPhaseCollector$CompositeLeafCollector.setScorer(QueryPhaseCollector.java:278)
       app/org.apache.lucene.core@10.3.1/org.apache.lucene.search.FilterLeafCollector.setScorer(FilterLeafCollector.java:37)
       app/org.apache.lucene.core@10.3.1/org.apache.lucene.search.ScoreCachingWrappingScorer$ScoreCachingWrappingLeafCollector.setScorer(ScoreCachingWrappingScorer.java:60)
       app/org.apache.lucene.core@10.3.1/org.apache.lucene.search.BooleanScorerSupplier$1$1.setScorer(BooleanScorerSupplier.java:262)
       app/org.apache.lucene.core@10.3.1/org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:254)
       app/org.apache.lucene.core@10.3.1/org.apache.lucene.search.BooleanScorerSupplier$1.score(BooleanScorerSupplier.java:270)
       app/org.apache.lucene.core@10.3.1/org.apache.lucene.search.ReqExclBulkScorer.score(ReqExclBulkScorer.java:69)
       app/org.elasticsearch.server@9.2.1/org.elasticsearch.search.internal.CancellableBulkScorer.score(CancellableBulkScorer.java:46)
       app/org.elasticsearch.server@9.2.1/org.elasticsearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:465)
       app/org.apache.lucene.core@10.3.1/org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:809)
       app/org.elasticsearch.server@9.2.1/org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:389)
       app/org.elasticsearch.server@9.2.1/org.elasticsearch.search.internal.ContextIndexSearcher.lambda$search$3(ContextIndexSearcher.java:367)
       app/org.elasticsearch.server@9.2.1/org.elasticsearch.search.internal.ContextIndexSearcher$$Lambda/0x000007f80143fcf0.call(Unknown Source)
       java.base@25.0.1/java.util.concurrent.FutureTask.run(FutureTask.java:328)
       app/org.apache.lucene.core@10.3.1/org.apache.lucene.search.TaskExecutor$Task.run(TaskExecutor.java:173)
       app/org.apache.lucene.core@10.3.1/org.apache.lucene.search.TaskExecutor.invokeAll(TaskExecutor.java:111)
       app/org.elasticsearch.server@9.2.1/org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:371)
       app/org.elasticsearch.server@9.2.1/org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:338)
       app/org.elasticsearch.server@9.2.1/org.elasticsearch.search.query.QueryPhase.addCollectorsAndSearch(QueryPhase.java:212)
       app/org.elasticsearch.server@9.2.1/org.elasticsearch.search.query.QueryPhase.executeQuery(QueryPhase.java:143)
       app/org.elasticsearch.server@9.2.1/org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:70)
       app/org.elasticsearch.server@9.2.1/org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:700)
       app/org.elasticsearch.server@9.2.1/org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:906)
       app/org.elasticsearch.server@9.2.1/org.elasticsearch.search.SearchService.lambda$executeQueryPhase$7(SearchService.java:739)
       app/org.elasticsearch.server@9.2.1/org.elasticsearch.search.SearchService$$Lambda/0x000007f801425550.get(Unknown Source)
       app/org.elasticsearch.server@9.2.1/org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:79)
       app/org.elasticsearch.server@9.2.1/org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:76)
       app/org.elasticsearch.server@9.2.1/org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:101)
       app/org.elasticsearch.server@9.2.1/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
       app/org.elasticsearch.server@9.2.1/org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35)
       app/org.elasticsearch.server@9.2.1/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1076)
       app/org.elasticsearch.server@9.2.1/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
       java.base@25.0.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1090)
       java.base@25.0.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614)
       java.base@25.0.1/java.lang.Thread.runWith(Thread.java:1487)
       java.base@25.0.1/java.lang.Thread.run(Thread.java:1474)

So the TopHitsAggregator seems to run hot. And yes, if I remove the sub-top_hits-aggregation, the response is very fast on ES9. But I need that.

And now it's getting really weird: I added "profile":true to my query:

{
    "profile": true,
    "query": {
        "bool": {
            "must_not": [
                {
                    "term": {
                        "files.storage": "INTERNAL_MANAGED"
                    }
                }
            ]
        }
    },
    "size": 1,
    "aggs": {
        "creatorsAgg": {
            "aggs": {
                "by_cone_id": {
                    "terms": {
                        "field": "metadata.creators.person.identifier.id"
                    },
                    "aggs": {
                        "creator_info": {
                            "top_hits": {
                                "_source": {
                                    "includes": [
                                        "metadata.creators.person.givenName",
                                        "metadata.creators.person.familyName",
                                        "metadata.creators.person.identifier.id"
                                    ]
                                },
                                "size": 1
                            }
                        }
                    }
                }
            },
            "nested": {
                "path": "metadata.creators"
            }
        }
    }
}

And suddenly it works like a charm on ES9. Response Times are:
1268ms, 220ms, 220ms, 252ms, 233ms

If I reset "profile" to false, response times are as bad as ever:
21764ms, 22145ms, 23670ms...

What is going here??? :exploding_head:

Anyway, here's the response of the query with "profile":true (I removed the hits and aggregation section due to the length):

{
    "took": 236,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "profile": {
        "shards": [
            {
                "id": "[_AX4jIRPS1uknpFQ6MoXQA][items][0]",
                "node_id": "_AX4jIRPS1uknpFQ6MoXQA",
                "shard_id": 0,
                "index": "items",
                "cluster": "(local)",
                "searches": [
                    {
                        "query": [
                            {
                                "type": "BooleanQuery",
                                "description": "-files.storage:INTERNAL_MANAGED #FieldExistsQuery [field=_primary_term]",
                                "time_in_nanos": 70940168,
                                "breakdown": {
                                    "set_min_competitive_score_count": 0,
                                    "match_count": 745994,
                                    "shallow_advance_count": 0,
                                    "set_min_competitive_score": 0,
                                    "next_doc": 38829435,
                                    "match": 17639098,
                                    "next_doc_count": 746016,
                                    "score_count": 564169,
                                    "compute_max_score_count": 0,
                                    "compute_max_score": 0,
                                    "advance": 0,
                                    "advance_count": 0,
                                    "count_weight_count": 0,
                                    "score": 10372802,
                                    "build_scorer_count": 44,
                                    "create_weight": 274041,
                                    "shallow_advance": 0,
                                    "count_weight": 0,
                                    "create_weight_count": 1,
                                    "build_scorer": 3824792
                                },
                                "children": [
                                    {
                                        "type": "TermQuery",
                                        "description": "files.storage:INTERNAL_MANAGED",
                                        "time_in_nanos": 6605494,
                                        "breakdown": {
                                            "set_min_competitive_score_count": 0,
                                            "match_count": 0,
                                            "shallow_advance_count": 0,
                                            "set_min_competitive_score": 0,
                                            "next_doc": 0,
                                            "match": 0,
                                            "next_doc_count": 0,
                                            "score_count": 0,
                                            "compute_max_score_count": 0,
                                            "compute_max_score": 0,
                                            "advance": 6354449,
                                            "advance_count": 309858,
                                            "count_weight_count": 0,
                                            "score": 0,
                                            "build_scorer_count": 44,
                                            "create_weight": 4583,
                                            "shallow_advance": 0,
                                            "count_weight": 0,
                                            "create_weight_count": 1,
                                            "build_scorer": 246462
                                        }
                                    },
                                    {
                                        "type": "FieldExistsQuery",
                                        "description": "FieldExistsQuery [field=_primary_term]",
                                        "time_in_nanos": 15006760,
                                        "breakdown": {
                                            "set_min_competitive_score_count": 0,
                                            "match_count": 0,
                                            "shallow_advance_count": 0,
                                            "set_min_competitive_score": 0,
                                            "next_doc": 14890760,
                                            "match": 0,
                                            "next_doc_count": 746016,
                                            "score_count": 0,
                                            "compute_max_score_count": 0,
                                            "compute_max_score": 0,
                                            "advance": 0,
                                            "advance_count": 0,
                                            "count_weight_count": 0,
                                            "score": 0,
                                            "build_scorer_count": 66,
                                            "create_weight": 2292,
                                            "shallow_advance": 0,
                                            "count_weight": 0,
                                            "create_weight_count": 1,
                                            "build_scorer": 113708
                                        }
                                    }
                                ]
                            }
                        ],
                        "rewrite_time": 269209,
                        "collector": [
                            {
                                "name": "QueryPhaseCollector",
                                "reason": "search_query_phase",
                                "time_in_nanos": 178226458,
                                "children": [
                                    {
                                        "name": "TopScoreDocCollector",
                                        "reason": "search_top_hits",
                                        "time_in_nanos": 9813723
                                    },
                                    {
                                        "name": "AggregatorCollector: [creatorsAgg]",
                                        "reason": "aggregation",
                                        "time_in_nanos": 135370832
                                    }
                                ]
                            }
                        ]
                    }
                ],
                "aggregations": [
                    {
                        "type": "NestedAggregator",
                        "description": "creatorsAgg",
                        "time_in_nanos": 136924698,
                        "breakdown": {
                            "reduce": 0,
                            "build_aggregation_count": 1,
                            "post_collection": 2750,
                            "reduce_count": 0,
                            "initialize_count": 1,
                            "collect_count": 564169,
                            "post_collection_count": 1,
                            "build_leaf_collector": 1705623,
                            "build_aggregation": 12730667,
                            "build_leaf_collector_count": 22,
                            "initialize": 49416,
                            "collect": 122436242
                        },
                        "debug": {
                            "built_buckets": 1
                        },
                        "children": [
                            {
                                "type": "GlobalOrdinalsStringTermsAggregator",
                                "description": "by_cone_id",
                                "time_in_nanos": 147614111,
                                "breakdown": {
                                    "reduce": 0,
                                    "build_aggregation_count": 1,
                                    "post_collection": 500,
                                    "reduce_count": 0,
                                    "initialize_count": 1,
                                    "collect_count": 2990900,
                                    "post_collection_count": 1,
                                    "build_leaf_collector": 1131956,
                                    "build_aggregation": 12688875,
                                    "build_leaf_collector_count": 22,
                                    "initialize": 20792,
                                    "collect": 133771988
                                },
                                "debug": {
                                    "segments_with_multi_valued_ords": 22,
                                    "collection_strategy": "dense",
                                    "segments_with_single_valued_ords": 0,
                                    "total_buckets": 72956,
                                    "built_buckets": 1,
                                    "result_strategy": "terms",
                                    "has_filter": false
                                },
                                "children": [
                                    {
                                        "type": "TopHitsAggregator",
                                        "description": "creator_info",
                                        "time_in_nanos": 63222736,
                                        "breakdown": {
                                            "reduce": 0,
                                            "build_aggregation_count": 1,
                                            "post_collection": 209,
                                            "reduce_count": 0,
                                            "initialize_count": 1,
                                            "collect_count": 820466,
                                            "post_collection_count": 1,
                                            "build_leaf_collector": 330331,
                                            "build_aggregation": 11600416,
                                            "build_leaf_collector_count": 22,
                                            "initialize": 2459,
                                            "collect": 51289321
                                        },
                                        "debug": {
                                            "fetch_profile": [
                                                {
                                                    "breakdown": {
                                                        "load_stored_fields": 122958,
                                                        "load_source": 0,
                                                        "load_stored_fields_count": 2,
                                                        "next_reader_count": 1,
                                                        "load_source_count": 0,
                                                        "next_reader": 211250
                                                    },
                                                    "time": 1369500
                                                },
                                                {
                                                    "breakdown": {
                                                        "load_stored_fields": 53125,
                                                        "load_source": 0,
                                                        "load_stored_fields_count": 2,
                                                        "next_reader_count": 1,
                                                        "load_source_count": 0,
                                                        "next_reader": 93584
                                                    },
                                                    "time": 418084
                                                },
                                                {
                                                    "breakdown": {
                                                        "load_stored_fields": 58583,
                                                        "load_source": 0,
                                                        "load_stored_fields_count": 2,
                                                        "next_reader_count": 1,
                                                        "load_source_count": 0,
                                                        "next_reader": 81125
                                                    },
                                                    "time": 345917
                                                },
                                                {
                                                    "breakdown": {
                                                        "load_stored_fields": 51667,
                                                        "load_source": 0,
                                                        "load_stored_fields_count": 2,
                                                        "next_reader_count": 1,
                                                        "load_source_count": 0,
                                                        "next_reader": 83625
                                                    },
                                                    "time": 319459
                                                },
                                                {
                                                    "breakdown": {
                                                        "load_stored_fields": 50084,
                                                        "load_source": 0,
                                                        "load_stored_fields_count": 2,
                                                        "next_reader_count": 1,
                                                        "load_source_count": 0,
                                                        "next_reader": 77792
                                                    },
                                                    "time": 359417
                                                },
                                                {
                                                    "breakdown": {
                                                        "load_stored_fields": 69750,
                                                        "load_source": 0,
                                                        "load_stored_fields_count": 2,
                                                        "next_reader_count": 1,
                                                        "load_source_count": 0,
                                                        "next_reader": 81125
                                                    },
                                                    "time": 346750
                                                },
                                                {
                                                    "breakdown": {
                                                        "load_stored_fields": 42375,
                                                        "load_source": 0,
                                                        "load_stored_fields_count": 2,
                                                        "next_reader_count": 1,
                                                        "load_source_count": 0,
                                                        "next_reader": 80917
                                                    },
                                                    "time": 317000
                                                },
                                                {
                                                    "breakdown": {
                                                        "load_stored_fields": 49833,
                                                        "load_source": 0,
                                                        "load_stored_fields_count": 2,
                                                        "next_reader_count": 1,
                                                        "load_source_count": 0,
                                                        "next_reader": 79375
                                                    },
                                                    "time": 327875
                                                },
                                                {
                                                    "breakdown": {
                                                        "load_stored_fields": 38042,
                                                        "load_source": 0,
                                                        "load_stored_fields_count": 2,
                                                        "next_reader_count": 1,
                                                        "load_source_count": 0,
                                                        "next_reader": 79417
                                                    },
                                                    "time": 311375
                                                },
                                                {
                                                    "breakdown": {
                                                        "load_stored_fields": 38542,
                                                        "load_source": 0,
                                                        "load_stored_fields_count": 2,
                                                        "next_reader_count": 1,
                                                        "load_source_count": 0,
                                                        "next_reader": 84167
                                                    },
                                                    "time": 564250
                                                },
                                                {
                                                    "breakdown": {
                                                        "load_stored_fields": 43792,
                                                        "load_source": 0,
                                                        "load_stored_fields_count": 2,
                                                        "next_reader_count": 1,
                                                        "load_source_count": 0,
                                                        "next_reader": 79667
                                                    },
                                                    "time": 333625
                                                },
                                                {
                                                    "breakdown": {
                                                        "load_stored_fields": 63958,
                                                        "load_source": 0,
                                                        "load_stored_fields_count": 2,
                                                        "next_reader_count": 1,
                                                        "load_source_count": 0,
                                                        "next_reader": 79625
                                                    },
                                                    "time": 346292
                                                },
                                                {
                                                    "breakdown": {
                                                        "load_stored_fields": 47958,
                                                        "load_source": 0,
                                                        "load_stored_fields_count": 2,
                                                        "next_reader_count": 1,
                                                        "load_source_count": 0,
                                                        "next_reader": 83250
                                                    },
                                                    "time": 310250
                                                },
                                                {
                                                    "breakdown": {
                                                        "load_stored_fields": 53292,
                                                        "load_source": 0,
                                                        "load_stored_fields_count": 2,
                                                        "next_reader_count": 1,
                                                        "load_source_count": 0,
                                                        "next_reader": 77875
                                                    },
                                                    "time": 337250
                                                },
                                                {
                                                    "breakdown": {
                                                        "load_stored_fields": 50999,
                                                        "load_source": 0,
                                                        "load_stored_fields_count": 2,
                                                        "next_reader_count": 1,
                                                        "load_source_count": 0,
                                                        "next_reader": 76250
                                                    },
                                                    "time": 313708
                                                },
                                                {
                                                    "breakdown": {
                                                        "load_stored_fields": 38084,
                                                        "load_source": 0,
                                                        "load_stored_fields_count": 2,
                                                        "next_reader_count": 1,
                                                        "load_source_count": 0,
                                                        "next_reader": 79541
                                                    },
                                                    "time": 306625
                                                },
                                                {
                                                    "breakdown": {
                                                        "load_stored_fields": 46500,
                                                        "load_source": 0,
                                                        "load_stored_fields_count": 2,
                                                        "next_reader_count": 1,
                                                        "load_source_count": 0,
                                                        "next_reader": 76333
                                                    },
                                                    "time": 289458
                                                },
                                                {
                                                    "breakdown": {
                                                        "load_stored_fields": 53292,
                                                        "load_source": 0,
                                                        "load_stored_fields_count": 2,
                                                        "next_reader_count": 1,
                                                        "load_source_count": 0,
                                                        "next_reader": 75750
                                                    },
                                                    "time": 310625
                                                },
                                                {
                                                    "breakdown": {
                                                        "load_stored_fields": 36750,
                                                        "load_source": 0,
                                                        "load_stored_fields_count": 2,
                                                        "next_reader_count": 1,
                                                        "load_source_count": 0,
                                                        "next_reader": 77458
                                                    },
                                                    "time": 293958
                                                },
                                                {
                                                    "breakdown": {
                                                        "load_stored_fields": 29875,
                                                        "load_source": 0,
                                                        "load_stored_fields_count": 2,
                                                        "next_reader_count": 1,
                                                        "load_source_count": 0,
                                                        "next_reader": 76917
                                                    },
                                                    "time": 275583
                                                },
                                                {
                                                    "breakdown": {
                                                        "load_stored_fields": 45000,
                                                        "load_source": 0,
                                                        "load_stored_fields_count": 2,
                                                        "next_reader_count": 1,
                                                        "load_source_count": 0,
                                                        "next_reader": 79000
                                                    },
                                                    "time": 304334
                                                },
                                                {
                                                    "breakdown": {
                                                        "load_stored_fields": 55251,
                                                        "load_source": 0,
                                                        "load_stored_fields_count": 2,
                                                        "next_reader_count": 1,
                                                        "load_source_count": 0,
                                                        "next_reader": 97291
                                                    },
                                                    "time": 326917
                                                },
                                                {
                                                    "breakdown": {
                                                        "load_stored_fields": 40958,
                                                        "load_source": 0,
                                                        "load_stored_fields_count": 2,
                                                        "next_reader_count": 1,
                                                        "load_source_count": 0,
                                                        "next_reader": 78500
                                                    },
                                                    "time": 302417
                                                },
                                                {
                                                    "breakdown": {
                                                        "load_stored_fields": 43417,
                                                        "load_source": 0,
                                                        "load_stored_fields_count": 2,
                                                        "next_reader_count": 1,
                                                        "load_source_count": 0,
                                                        "next_reader": 75000
                                                    },
                                                    "time": 283834
                                                },
                                                {
                                                    "breakdown": {
                                                        "load_stored_fields": 37333,
                                                        "load_source": 0,
                                                        "load_stored_fields_count": 2,
                                                        "next_reader_count": 1,
                                                        "load_source_count": 0,
                                                        "next_reader": 79500
                                                    },
                                                    "time": 287084
                                                }
                                            ],
                                            "built_buckets": 25
                                        }
                                    }
                                ]
                            }
                        ]
                    }
                ],
                "fetch": {
                    "type": "fetch",
                    "description": "",
                    "time_in_nanos": 288000,
                    "breakdown": {
                        "load_stored_fields": 57875,
                        "load_source": 1625,
                        "load_stored_fields_count": 1,
                        "next_reader_count": 1,
                        "load_source_count": 1,
                        "next_reader": 110875
                    },
                    "debug": {
                        "stored_fields": [
                            "_id",
                            "_routing",
                            "_source"
                        ]
                    },
                    "children": [
                        {
                            "type": "FetchFieldsPhase",
                            "description": "",
                            "time_in_nanos": 15458,
                            "breakdown": {
                                "process_count": 1,
                                "process": 9458,
                                "next_reader": 6000,
                                "next_reader_count": 1
                            }
                        },
                        {
                            "type": "FetchSourcePhase",
                            "description": "",
                            "time_in_nanos": 3249,
                            "breakdown": {
                                "process_count": 1,
                                "process": 3166,
                                "next_reader": 83,
                                "next_reader_count": 1
                            },
                            "debug": {
                                "fast_path": 1
                            }
                        },
                        {
                            "type": "StoredFieldsPhase",
                            "description": "",
                            "time_in_nanos": 417,
                            "breakdown": {
                                "process_count": 1,
                                "process": 292,
                                "next_reader": 125,
                                "next_reader_count": 1
                            }
                        }
                    ]
                }
            }
        ]
    }
}

Thanks! @Ignacio_Vera thinks that this could be due to concurrent search, as profiling turns that off. Could you please try using the `max_concurrent_shard_requests=1` query parameter to confirm this suspicion? Not as a permanent workaround, just to help us narrow this down.

That is shard request concurrency and the issue is with segment search concurrency. That parameter should have no effect.

1 Like

Oh sorry, what is the correct way to test this then?

If it is related to concurrent segment search, might it be worthwhile to clone the index and then forcemerge this down to a single segment and see if there are any differences in performance?

I tried it anyway, you're right, no effect.
Can I provide any more information or data?

@Ignacio_Vera @Quentin_Pradet
Sorry to bother, but before this topic is closed automatically: Is there any news on this issue?

As a sidenote: The topics shouldn’t auto-close any more. Comments on ancient or unrelated posts are not super helpful (and that was the initial idea around auto-closing) but we’ve steered too far on it — we don’t want to cut off conversations.

1 Like

I have been trying to reproduce the issue but I have not been able to. My suspicion is that there is some issue with concurrent search when combining nested aggregations and top hits aggregations. Would you be able to produce a minimal reproduction?

Thank you.
I created a zipped data directory (2.6GB zipped, 3.8GB unzipped) for Elasticsearch 9.2.1, which contains about 10 Mio documents. You can run the query from the OP on it.

Of course, it's faster as there are less documents, but you still see the effect. On my machine, I get the following numbers on ES 9.2.1:
with size=0: 7493ms, 25ms, 7ms, 7ms, 7ms
with size=1: 7631ms, 7556ms, 7625ms, 7597ms

It doesn't contain any senstive data, but I will send you the link in a private message anyway. Hope you can reproduce the numbers with it.
If you need anything else (e.g. the same data for ES8), let me know.
Thank you

Thank you, that was very useful.

I have opened this PR Speed up LeafCollector#setScorer in TopHitsAggregator by iverase · Pull Request #138883 · elastic/elasticsearch · GitHub which I think it addresses the slowdown you are experiencing. It was a sneaky one.

3 Likes

Awesome, sounds very good. Thank you for your help and effort!