How to debug Timeout Error

Hi All,
I am getting the following error when performing a bulk insert.Caused by: java.io.IOException: listener timeout after waiting for [30000] ms
at elasticsearch.client.RestClient$SyncResponseListener.get(RestClient.java:699)
at elasticsearch.client.RestClient.performRequest(RestClient.java:224)
at elasticsearch.client.RestClient.performRequest(RestClient.java:196)

What are the steps to debug this issue ? In the Elastic search logs I am not seeing any error.
Is there anyway we can identify the reason behind the timeout?
Should we enable some specific properties to enable the logs?

What is the specification of your cluster? Is it under heavy load? What is the size of your requests?

What is this package com.oracle.es.elasticsearch.client.RestClient?

2 Likes

We have a two node setup with 12 GB each. It is under heavy load. We are pushing 100 documents at a time each doc having 20kb in average.

See the elastic logs - https://drive.google.com/file/d/1vZbjfByxZ0oiZ11a5evdcahDP4MttbCA/view?usp=sharing

Its our code from where i am sending the bulk request.

What appears to be limiting performance? Is CPU maxed out? Are you seeing a lot of iowait due to potentially slow storage? Any indications in the logs of slow merging or long and/or frequent GC?

When we enable slow logs we can see some documents taking time.
I checked the heap, and only 60% is used.

How can we check f its because of slow merging or iowait?

Use iostat on the data nodes.

I am seeing a similar issue and can see the following in the elastic server logs when the timeouts occur

[2019-03-22T14:40:08,361][DEBUG][o.e.i.e.InternalEngine$EngineMergeScheduler] [ElasticServer1] [mm_110fe1d3-13cb-4d3c-aec8-771cc04a789c_d53dc9e5-3132-4b31-bd9a-24609f1b2334][2] merge segment [_w] done: took [2m], [627.1 MB], [624,779 docs], [0s stopped], [13.1s throttled], [613.9 MB written], [18.2 MB/sec throttle]
[2019-03-22T14:40:40,437][DEBUG][o.e.i.e.InternalEngine$EngineMergeScheduler] [ElasticServer1] [mm_110fe1d3-13cb-4d3c-aec8-771cc04a789c_d53dc9e5-3132-4b31-bd9a-24609f1b2334][4] merge segment [_v] done: took [2.4m], [793.0 MB], [776,320 docs], [0s stopped], [17.7s throttled], [781.3 MB written], [18.2 MB/sec throttle]
[2019-03-22T14:40:41,615][DEBUG][o.e.m.j.JvmGcMonitorService] [ElasticServer1] [gc][245344] overhead, spent [121ms] collecting in the last [1s]
[2019-03-22T14:40:43,638][DEBUG][o.e.m.j.JvmGcMonitorService] [ElasticServer1] [gc][245346] overhead, spent [108ms] collecting in the last [1s]
[2019-03-22T14:40:58,664][DEBUG][o.e.m.j.JvmGcMonitorService] [ElasticServer1] [gc][245361] overhead, spent [103ms] collecting in the last [1s]
[2019-03-22T14:41:01,018][DEBUG][o.e.i.e.InternalEngine$EngineMergeScheduler] [ElasticServer1] [mm_110fe1d3-13cb-4d3c-aec8-771cc04a789c_d53dc9e5-3132-4b31-bd9a-24609f1b2334][0] merge segment [_13] done: took [1.6m], [533.6 MB], [511,864 docs], [0s stopped], [13.2s throttled], [530.5 MB written], [16.5 MB/sec throttle]

When this occurs will the bulk indexing get slowed down resulting in timeouts? It looks so from the slowlogs collected in a previous run. I did try out invoking iostat during the process but did not see much iowait, however the segment merge happened after I invoked iostat so its possible there was a slowdown during the merge.

What would be your recommendation to prevent these timeouts? Should i be increasing the timeout or reducing the number of threads that are currently pushing data during indexing or both?

Also assume that I use a single thread to push data, even then at some point of time there will be a segment merge happening and if it again takes 1 to 2 minutes as above and slows down the indexing failure can still occur. So what is the recommended way out of this? I want indexing to not fail and some reduction in indexing speed is not a problem.

It looks like segment merging can't keep up, which is a sign you likely have very slow storage that is the bottleneck. You should have a look at these guidelines. In older versions there used to be parameters related to the number of merging threads that needed to be tuned, but that has since been automated. The best way to get rid of this problem is however to upgrade to faster and more performant storage.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.