Very slow searches on a very large shard

mabn0 · January 25, 2018, 12:55am

I have a 3 node cluster with an index which contains 72 shards. There are 330M documents amounting to 100GB according to _cat/indices. This index has ~1000 fields, but each document has no more than 30.

Routing by customer id is used, but one of the customers in this index has abnormally many documents so one of the shards is much larger - 30GB. Other shards are around 0.5-1GB.

The problem is that queries to this large shard often take 20-30 seconds, even for routing id matching not many documents (e.g. 10k). The performance issues happen when there is lucene merging thread active for this index. The queries are rather simple - a set of term/bool filters + sorting by 2-3 fields.

Queries to small (0.5-1GB) shards are fine and take < 200ms.

The only other abnormality I found is that his large index has a lot of deleted documents - probably 50%.

It's not happening all the time, only during increased indexing throughput, or afterwards.
index.merge.scheduler.max_merge_count is set to 1 to limit cpu used for merging.

Questions:

what to do to fix those slow queries? I plan to reduce num of segments by running optimize and then probably move data from this large shard to a separate index
why is merging thread affecting searching so much? There is still CPU time to spare.Those nodes have 4 cpus (8 threads), there is one merging thread and cpu usage stays around 25% (0.5 cpu for lucene merging, the rest for searching and some indexing)
maybe there is some other reason and my suspicions are incorrect?

ES version: 5.3.2

s1monw · January 26, 2018, 10:26am

hey Marcin, good questions! Lemme try to explain some things....

this is no good. We should try to fix this down the road. I think you have one or more hotspot customer that you should either try to split out into a separate index or you use a routing partitioned index that is available since 5.3.0. This will solve your hotspot issue and keep your shard sizes at bay.

this sounds counter-intuitive until you look at what merges do and why. When we get a lot of changes to an index we have to build up a lot of new segments that need to be merged since we are append only on the lower levels. At the same time every doc that is updated caused a delete unless it's the first time we see the doc. Now you end up with a lot of data that needs to be read and written to disk. Unfortunately java can't read / write data without going through the FS cache that means we are subject to cache trashing in the case of heavy merging. This is something we try to improve once java 10 is available since then we might be able to use O_DIRECT when opening streams to write / read from disk without going through the cache. I think it's less of a concurrency issue here...

I't try to use routing partitioning first and see what the impact will be if you do.

mabn0 · January 26, 2018, 12:05pm

Actually we already keep large customers in separate indices or clusters (which do not route by customer id), but this one did something that added a lot of nested documents and we noticed too late (and their shard grew).
But partitioned routing sounds like a good way to mitigate this, thanks for the suggestion.

s1monw:

this sounds counter-intuitive until you look at what merges do and why. When we get a lot of changes to an index we have to build up a lot of new segments that need to be merged since we are append only on the lower levels. At the same time every doc that is updated caused a delete unless it's the first time we see the doc. Now you end up with a lot of data that needs to be read and written to disk. Unfortunately java can't read / write data without going through the FS cache that means we are subject to cache trashing in the case of heavy merging. This is something we try to improve once java 10 is available since then we might be able to use O_DIRECT when opening streams to write / read from disk without going through the cache. I think it's less of a concurrency issue here...

This makes sense. But this also means that additional merge threads could have negative impact, correct?

What metric would be best to look at to detect that such trashing is happening? Something like major page faults, or maybe merges.current_size_in_bytes from /_nodes/stats?

Thanks!

s1monw · January 26, 2018, 1:08pm

yeah so I think it would be good to have an eye on page cache stats like:

page cache accesses / dirties
page cache writes / additions

note I think I recall the is a cost to measure this but last time I checked it was less than 5%

system · February 23, 2018, 1:08pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch sharding Elasticsearch	3	289	July 6, 2017
Merging too heavy - causing very slow response times Elasticsearch	4	675	July 6, 2017
Search Performance Elasticsearch	9	369	July 6, 2017
Question: How to gauge/improve performance Elasticsearch	17	1038	July 6, 2017
Heavy indexing cause severe delay for searching Elasticsearch	12	506	July 6, 2017

Very slow searches on a very large shard

Related topics