Query profile - too much time spent in build_scorer?

brianelliott · September 11, 2019, 8:52pm

Hello,

I am using the profile API to try and debug a slow query (3-7s). From what I can tell, the operation to build scorers for the geo_shapes in the query are particularly slow and nearly all of the query time is spent doing this.

Here's a snippet of the profiler output for a piece of the query where I see "build_scorer" occupying a lot of time:

{'type': 'IntersectsPrefixTreeQuery', 'description': 'IntersectsPrefixTreeQuery(fieldName=geometry,queryShape=POLYGON ((2.6367 37.5097, 10.6348 37.3702, 26.7188 33.9434, 25.4004 38.891, 29.4434 40.4469, 41.6602 49.2678, 28.2129 56.17, 32.168 70.2, 25.4004 71.7464, 17.1387 71.329, 1.4062 61.5227, -19.6875 68.4315, -31.0254 64.7741, -10.459 50.1769, -11.0742 35.7465, -2.9883 35.8891, 2.6367 37.5097)),detailLevel=18,prefixGridScanLevel=14)', 'time': '3.7s', 'time_in_nanos': 3781281199, 'breakdown': {'score': 0, 'build_scorer_count': 33, 'match_count': 0, 'create_weight': 962, 'next_doc': 7025, 'match': 0, 'create_weight_count': 1, 'next_doc_count': 4, 'score_count': 0, 'build_scorer': 3781241102, 'advance': 32042, 'advance_count': 30}}

Note that 3.7s just to build the scorer seems high. I'm looking for help to understand what is going on here. Any pointers to more information would be appreciated.

Cluster config:
ES 6.2.4
120 data nodes, 3 master
377 indices
4200 shards
~1.6B docs

Gist with query and response with profiling: https://gist.github.com/bdelliott/41ef120158ad2a0188303b2385ba6e23

Thanks,
Brian

jpountz · September 13, 2019, 2:11pm

This is high indeed for 33 segments. If this is a read-only index you might see better performance after force-merging the index. A better solution would probably be to index to a more recent version (6.7 or 7.x) in order to benefit from the new implementation of shapes based on KD trees rather than the inverted index, which was introduced to address this kind of performance issues.

brianelliott · September 13, 2019, 2:34pm

Thanks for your response.

It is not a read-only index. The documents in all our indices do get created/updated/deleted, though at different rates.
I'm planning to migrate the cluster to 7.x ASAP.

I am curious, however, what is taking so long here? From the docs it sounds like building scorers should not be searching the actual data set in the index but otherwise I am unable to see what would take 3.7 seconds for that function.

Thanks,
Brian

jpountz · September 13, 2019, 3:11pm

Most scorers compute matches lazily, in the next_doc and advance functions that you can see in the profile. However scorers have an important constraint: they need to iterate over doc IDs in order. For some queries like term queries it's easy since matches are already stored by increasing doc ID on disk. But for more complex queries like range queries, wildcard queries or geo queries this is much more complicated so what they do is that they compute a bit set of matches up-front (in build_scorer) and then next_doc and advance just iterate over that bit set.

BKD-based shapes work this way too, but use a better data-structure that makes it more efficient to create this bit set of matches.

brianelliott · September 13, 2019, 4:52pm

Thanks, that clarifies things!

system · October 11, 2019, 4:52pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Action to take to lower build_scorer time Elasticsearch	1	682	December 5, 2018
Filter context and Build Scorers Elasticsearch	7	3760	February 11, 2019
Filter context strange results on ElasticSearch 6.3.0 Elasticsearch	1	461	August 17, 2018
High build_scorer value in the query profile result for query wrapped in constant_score Elasticsearch	1	22	December 14, 2024
[elasticserach performance] Elasticsearch-spend-all-time-in-build-scorer Elasticsearch	2	2363	February 2, 2018

Query profile - too much time spent in build_scorer?

Related topics