Thanks for sharing your thoughts,
There are nether snapshotting nor shard moving going on. The index is spread equally across nodes with no replica assigned to it. We currently dont use custom routing and the node is a standalone elasticsearch server but regarding merging there is something interesting going on:
Turning on debug log on the node, it turned out that there is an quite excessive segment merging going on:
[2015-08-26 09:06:41,142][DEBUG][index.merge.scheduler ] [node] [logstash-2015.08.26][2] merge segment [_qci] done: took [1.4m], [12.1 MB], [7,252 docs]
[2015-08-26 09:06:41,558][DEBUG][index.merge.scheduler ] [node] [logstash-2015.08.26][10] merge segment [_qwd] done: took [2.8m], [26.2 MB], [19,111 docs]
[2015-08-26 09:06:42,042][DEBUG][index.merge.scheduler ] [node] [logstash-2015.08.26][10] merge segment [_qww] done: took [1.9m], [17.2 MB], [12,051 docs]
[2015-08-26 09:06:59,605][DEBUG][index.merge.scheduler ] [node] [logstash-2015.08.26][8] merge segment [_tp4] done: took [1.1m], [10.1 MB], [5,767 docs]
[2015-08-26 09:07:02,634][DEBUG][index.merge.scheduler ] [node] [logstash-2015.08.26][4] merge segment [_q94] done: took [10.7m], [95.7 MB], [76,401 docs]
[2015-08-26 09:07:06,717][DEBUG][index.merge.scheduler ] [node] [logstash-2015.08.26][6] merge segment [_qbv] done: took [1.5m], [14.1 MB], [8,574 docs]
[2015-08-26 09:07:07,975][DEBUG][index.merge.scheduler ] [node] [logstash-2015.08.26][4] merge segment [_qfb] done: took [1.6m], [14.8 MB], [9,040 docs]
[2015-08-26 09:07:09,500][DEBUG][index.merge.scheduler ] [node] [logstash-2015.08.26][3] merge segment [_q7z] done: took [1.5m], [12.2 MB], [7,240 docs]
[2015-08-26 09:07:27,616][DEBUG][index.merge.scheduler ] [node] [logstash-2015.08.26][4] merge segment [_qg9] done: took [27.3s], [2.5 MB], [1,037 docs]
[2015-08-26 09:07:32,393][DEBUG][index.merge.scheduler ] [node] [logstash-2015.08.26][6] merge segment [_qcu] done: took [29.3s], [3.8 MB], [1,605 docs]
[2015-08-26 09:07:54,782][DEBUG][index.merge.scheduler ] [node] [logstash-2015.08.26][8] merge segment [_tpl] done: took [1.3m], [6.6 MB], [3,725 docs]
[2015-08-26 09:08:11,567][DEBUG][index.merge.scheduler ] [node] [logstash-2015.08.26][8] merge segment [_tp3] done: took [2.3m], [17.1 MB], [10,369 docs]
[2015-08-26 09:08:14,488][DEBUG][index.merge.scheduler ] [node] [logstash-2015.08.26][2] merge segment [_qdj] done: took [1.1m], [7.0 MB], [3,816 docs]
[2015-08-26 09:08:14,495][DEBUG][index.merge.scheduler ] [node] [logstash-2015.08.26][3] merge segment [_q6t] done: took [4.4m], [32.1 MB], [23,817 docs]
[2015-08-26 09:08:21,565][DEBUG][index.merge.scheduler ] [node] [logstash-2015.08.26][10] merge segment [_qxn] done: took [2.2m], [14.6 MB], [8,733 docs]
[2015-08-26 09:08:22,182][DEBUG][index.merge.scheduler ] [node] [logstash-2015.08.26][2] merge segment [_qco] done: took [2.6m], [17.5 MB], [10,817 docs]
[2015-08-26 09:08:31,065][DEBUG][index.merge.scheduler ] [node] [logstash-2015.08.26][3] merge segment [_q8u] done: took [1.3m], [8.2 MB], [4,797 docs]
On the other node, where the index lies, there is hardly any merging going on which is a bit strange, considering that an equally amount of shards lie on each nodes with no routing specified.
Regarding settings:
curl localhost:9200/_cluster/settings?pretty
{
"persistent" : {
"cluster" : {
"routing" : {
"allocation" : {
"cluster_concurrent_rebalance" : "6",
"node_concurrent_recoveries" : "10",
"disk" : {
"watermark" : {
"low" : "93%",
"high" : "96%"
}
},
"node_initial_primaries_recoveries" : "10"
}
}
},
"indices" : {
"store" : {
"throttle" : {
"type" : "merge",
"max_bytes_per_sec" : "300mb"
}
},
"recovery" : {
"concurrent_streams" : "15",
"max_bytes_per_sec" : "1500mb"
}
}
},
curl localhost:9200/_nodes/node/settings?pretty
"merge" : {
"scheduler" : {
"max_thread_count" : "10"
},
"policy" : {
"segments_per_tier" : "120",
"max_merge_at_once_explicit" : "30",
"max_merge_at_once" : "10"
}
}
},
These are the settings regarding merging, the node each have 64 cpu cores, so 10 threads for merging is ok imo.
Any additional thoughts on this? Thanks