MatchAllQuery is slow once segment size exceeds 1

Hi,

I'm managing an extra-small index and had some issues with the latency. The index has

  • ~4000 documents (25mb in total)
  • 1 shard
  • low index and search traffic

I need to periodically fetch all of the documents from the index, and this started to take between 500-950ms ("took" time) using a match all query. Checking out slow logs, most of this time is spent in the fetch phase.

I did some testing by creating an identical index and I was consistently getting sub-100ms "took" times with the new one.

I noticed that if the slow index was force-merged with max_num_segments=1, it would begin to show sub-100ms latency. The moment that the number of segments for the index hit 2, the latency would increase by almost 10x. I obviously can't keep the number of segments at 1, and the test index I created was still outperforming the existing index when it had multiple segments.

During testing, there were no other indicators for the slow speed (tested while there were no other searches coming through, low cpu usage, no problems with hot threads)

  1. Is this sort of latency (over 500ms) reasonable for getting only ~25mb of documents in a query?
  2. Is there any cache that could be utilized to speed up the response? Looking at node stats, the request and query caches are practically empty and filesystem has loads of extra space.

Hi @jwSmith1

Could you specify which version of Elasticsearch you're using?

Regarding your question;

  1. Is this sort of latency (over 500ms) reasonable for getting only ~25mb of documents in a query?

This depends on a number of factors. But the general idea is you're asking Elasticsearch to do (even if indirectly) is read and serialize data from disk & network, which can be slow.

https://stackoverflow.com/a/50936852 is a pretty good breakdown of when/how the Fetch phase comes into play and some possibilities of how you can avoid/mitigate it.

Tune for search speed | Elasticsearch Guide [8.11] | Elastic is a more generic doc on general things to improve performance.

  1. Is there any cache that could be utilized to speed up the response? Looking at node stats, the request and query caches are practically empty and filesystem has loads of extra space.

Yes, there are a few layers of caching that can be leveraged (some in your control, some not so in your control).

Elasticsearch caching deep dive: Boosting query speed one cache at a time | Elastic Blog is a pretty comprehensive dive into the various layers of caching that Elasticsearch leverages. (It is a somewhat old post, but still fairly accurate from what I can find).

Tune for search speed | Elasticsearch Guide [8.11] | Elastic also covers a bit more about caching.

Thanks @BenB196,

I'm on version 7.10.2. It does make sense to me how a lot of data is being fetched for the matchAllQuery, but I'm stuck on why two indices with nearly identical contents and node/shard distributions can give such different results.
I looked into the caching, and neither the shard level request cache or query cache seem applicable to my use case.

  • Query cache- it specifies that shards must have at least 10k docs in order to be cached
  • Request cache - MatchAllQuery is never cached per the caching policy

This is most likely because it is multiple segments, depending on the type of disks you're using, you could be getting hit by random I/O performance, I believe a single segment, might be more generally a sequential I/O.

I'd also recommend testing on a newer version of Elastic, there have been significant changes since 7.10.2, and I believe that some of these changes have touched the fetch phase performance.

That makes sense. Is there any way to effectively keep the number of search segments at 1 for an index that is read/write? The merge policy settings allow segments_per_tier to be a minimum of 2.

None that I'm aware of, in fact this is somewhat discouraged for actively written to indices/shards here.

You have a few options:

  1. 25MB isn't too much data, so regularly force merging on an active index, while not idle probably wouldn't hurt anything.
  2. Look at your underlying storage layer to see if it can be improved.
  3. Look at what you're doing (the query) and see if you can achieve similar results in a better way. Some of the links I provided show some methods which may/may not work depending on your needs.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.