Merge policy tuning hint

Hello,

You can reduce the number of concurrent merges by lowering
index.merge.policy.max_merge_at_once and
index.merge.policy.max_merge_at_once_explicit. Another thing that might
help is to lower the index.merge.scheduler.max_thread_count, especially
since the default is based on the number of processors. And you have many
of those :slight_smile:

Documentation for all the above settings can be found here:

250 shards per node is a lot. Can you bring that number down? How many
indices do you have?

Another interesting piece of information is what's overloaded when a node
drops (CPU, IO,etc)? Any interesting information in the logs?

If it's CPU, lowering the number of concurrent merges and threads should
help. If it's IO, you might additionally look at store level throttling,
especially for the type "merge":

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

On Thu, Dec 13, 2012 at 10:24 PM, arta artasano@sbcglobal.net wrote:

Hi,
I think I am seeing node performance degrades when the merge count
("merges":{"current": x, ...}} in node stats) increases to somewhere 6 to
10, and that results the node drop from the cluster.

I enabled index.merge DEBUG log then I see following logs for each shard:

[2012-12-12 18:29:44,136][DEBUG][index.merge.policy ] [Scourge of
the
Underworld] [i11][25] using [tiered] merge policy with
expunge_deletes_allowed[10.0], floor_segment[2mb], max_merge_at_once[10],
max_merge_at_once_explicit[30], max_merged_segment[5gb],
segments_per_tier[10.0], reclaim_deletes_weight[2.0], async_merge[true]
[2012-12-12 18:29:44,136][DEBUG][index.merge.scheduler ] [Scourge of
the
Underworld] [i11][25] using [concurrent] merge scheduler with
max_thread_count[3]

I'm using default merge policy on each node with 16 cores.
num_replicas is 1.
Each node has around 250 shards including primary and secondary shards.
ES is 0.19.3.

This morning I encountered the node drop, and at that time, the node
dropped
was doing many merges, according to the log. The time used to each merge
seems increasing when the node dropped.
It started with relatively short time like this:

[2012-12-13 07:43:07,235][DEBUG][index.merge.scheduler ] [Scourge of
the
Underworld] [i13][27] merge [_72z9] done, took [54.8s]
(the size of this shard is 4.2GB)

Then the time increased to minutes (7:50 is the time when this node
dropped
from the cluster):

[2012-12-13 07:50:30,396][DEBUG][index.merge.scheduler ] [Scourge of
the
Underworld] [i15][3] merge [_16sc3] done, took [4.8m]
(the size of this shard is 5.8GB)

It kept increasing:

[2012-12-13 07:57:17,056][DEBUG][index.merge.scheduler ] [Scourge of
the
Underworld] [i531][22] merge [_6293] done, took [11m]
(the size of this shard is 3.6GB)

I'm not sure this is related to my problem (node drop), but my gut feeling
is I might need some tuning on the merge policy.

Any hints or guidance would be very much appreciated.
Thank you for advance.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/merge-policy-tuning-hint-tp4026976.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--

--