I am seeing a similar issue and can see the following in the elastic server logs when the timeouts occur
[2019-03-22T14:40:08,361][DEBUG][o.e.i.e.InternalEngine$EngineMergeScheduler] [ElasticServer1] [mm_110fe1d3-13cb-4d3c-aec8-771cc04a789c_d53dc9e5-3132-4b31-bd9a-24609f1b2334][2] merge segment [_w] done: took [2m], [627.1 MB], [624,779 docs], [0s stopped], [13.1s throttled], [613.9 MB written], [18.2 MB/sec throttle]
[2019-03-22T14:40:40,437][DEBUG][o.e.i.e.InternalEngine$EngineMergeScheduler] [ElasticServer1] [mm_110fe1d3-13cb-4d3c-aec8-771cc04a789c_d53dc9e5-3132-4b31-bd9a-24609f1b2334][4] merge segment [_v] done: took [2.4m], [793.0 MB], [776,320 docs], [0s stopped], [17.7s throttled], [781.3 MB written], [18.2 MB/sec throttle]
[2019-03-22T14:40:41,615][DEBUG][o.e.m.j.JvmGcMonitorService] [ElasticServer1] [gc][245344] overhead, spent [121ms] collecting in the last [1s]
[2019-03-22T14:40:43,638][DEBUG][o.e.m.j.JvmGcMonitorService] [ElasticServer1] [gc][245346] overhead, spent [108ms] collecting in the last [1s]
[2019-03-22T14:40:58,664][DEBUG][o.e.m.j.JvmGcMonitorService] [ElasticServer1] [gc][245361] overhead, spent [103ms] collecting in the last [1s]
[2019-03-22T14:41:01,018][DEBUG][o.e.i.e.InternalEngine$EngineMergeScheduler] [ElasticServer1] [mm_110fe1d3-13cb-4d3c-aec8-771cc04a789c_d53dc9e5-3132-4b31-bd9a-24609f1b2334][0] merge segment [_13] done: took [1.6m], [533.6 MB], [511,864 docs], [0s stopped], [13.2s throttled], [530.5 MB written], [16.5 MB/sec throttle]
When this occurs will the bulk indexing get slowed down resulting in timeouts? It looks so from the slowlogs collected in a previous run. I did try out invoking iostat during the process but did not see much iowait, however the segment merge happened after I invoked iostat so its possible there was a slowdown during the merge.
What would be your recommendation to prevent these timeouts? Should i be increasing the timeout or reducing the number of threads that are currently pushing data during indexing or both?
Also assume that I use a single thread to push data, even then at some point of time there will be a segment merge happening and if it again takes 1 to 2 minutes as above and slows down the indexing failure can still occur. So what is the recommended way out of this? I want indexing to not fail and some reduction in indexing speed is not a problem.