How to debug Timeout Error

harimenonm · March 19, 2019, 11:22am

Hi All,
I am getting the following error when performing a bulk insert.Caused by: java.io.IOException: listener timeout after waiting for [30000] ms
at elasticsearch.client.RestClient$SyncResponseListener.get(RestClient.java:699)
at elasticsearch.client.RestClient.performRequest(RestClient.java:224)
at elasticsearch.client.RestClient.performRequest(RestClient.java:196)

What are the steps to debug this issue ? In the Elastic search logs I am not seeing any error.
Is there anyway we can identify the reason behind the timeout?
Should we enable some specific properties to enable the logs?

Christian_Dahlqvist · March 19, 2019, 11:28am

What is the specification of your cluster? Is it under heavy load? What is the size of your requests?

dadoonet · March 19, 2019, 11:35am

What is this package com.oracle.es.elasticsearch.client.RestClient?

harimenonm · March 21, 2019, 9:07am

We have a two node setup with 12 GB each. It is under heavy load. We are pushing 100 documents at a time each doc having 20kb in average.

See the elastic logs - https://drive.google.com/file/d/1vZbjfByxZ0oiZ11a5evdcahDP4MttbCA/view?usp=sharing

harimenonm · March 21, 2019, 9:08am

Its our code from where i am sending the bulk request.

Christian_Dahlqvist · March 21, 2019, 9:12am

What appears to be limiting performance? Is CPU maxed out? Are you seeing a lot of iowait due to potentially slow storage? Any indications in the logs of slow merging or long and/or frequent GC?

harimenonm · March 21, 2019, 10:19am

When we enable slow logs we can see some documents taking time.
I checked the heap, and only 60% is used.

How can we check f its because of slow merging or iowait?

Christian_Dahlqvist · March 21, 2019, 10:43am

Use iostat on the data nodes.

manucet · March 22, 2019, 2:52pm

I am seeing a similar issue and can see the following in the elastic server logs when the timeouts occur

[2019-03-22T14:40:08,361][DEBUG][o.e.i.e.InternalEngine$EngineMergeScheduler] [ElasticServer1] [mm_110fe1d3-13cb-4d3c-aec8-771cc04a789c_d53dc9e5-3132-4b31-bd9a-24609f1b2334][2] merge segment [_w] done: took [2m], [627.1 MB], [624,779 docs], [0s stopped], [13.1s throttled], [613.9 MB written], [18.2 MB/sec throttle]
[2019-03-22T14:40:40,437][DEBUG][o.e.i.e.InternalEngine$EngineMergeScheduler] [ElasticServer1] [mm_110fe1d3-13cb-4d3c-aec8-771cc04a789c_d53dc9e5-3132-4b31-bd9a-24609f1b2334][4] merge segment [_v] done: took [2.4m], [793.0 MB], [776,320 docs], [0s stopped], [17.7s throttled], [781.3 MB written], [18.2 MB/sec throttle]
[2019-03-22T14:40:41,615][DEBUG][o.e.m.j.JvmGcMonitorService] [ElasticServer1] [gc][245344] overhead, spent [121ms] collecting in the last [1s]
[2019-03-22T14:40:43,638][DEBUG][o.e.m.j.JvmGcMonitorService] [ElasticServer1] [gc][245346] overhead, spent [108ms] collecting in the last [1s]
[2019-03-22T14:40:58,664][DEBUG][o.e.m.j.JvmGcMonitorService] [ElasticServer1] [gc][245361] overhead, spent [103ms] collecting in the last [1s]
[2019-03-22T14:41:01,018][DEBUG][o.e.i.e.InternalEngine$EngineMergeScheduler] [ElasticServer1] [mm_110fe1d3-13cb-4d3c-aec8-771cc04a789c_d53dc9e5-3132-4b31-bd9a-24609f1b2334][0] merge segment [_13] done: took [1.6m], [533.6 MB], [511,864 docs], [0s stopped], [13.2s throttled], [530.5 MB written], [16.5 MB/sec throttle]

When this occurs will the bulk indexing get slowed down resulting in timeouts? It looks so from the slowlogs collected in a previous run. I did try out invoking iostat during the process but did not see much iowait, however the segment merge happened after I invoked iostat so its possible there was a slowdown during the merge.

What would be your recommendation to prevent these timeouts? Should i be increasing the timeout or reducing the number of threads that are currently pushing data during indexing or both?

Also assume that I use a single thread to push data, even then at some point of time there will be a segment merge happening and if it again takes 1 to 2 minutes as above and slows down the indexing failure can still occur. So what is the recommended way out of this? I want indexing to not fail and some reduction in indexing speed is not a problem.

Christian_Dahlqvist · March 23, 2019, 6:51am

It looks like segment merging can't keep up, which is a sign you likely have very slow storage that is the bottleneck. You should have a look at these guidelines. In older versions there used to be parameters related to the number of merging threads that needed to be tuned, but that has since been automated. The best way to get rid of this problem is however to upgrade to faster and more performant storage.

system · April 20, 2019, 6:51am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
When indexing a few hours duration, a timeout error occurs Elasticsearch	2	380	July 6, 2017
Missing in bulk response Elasticsearch	5	832	July 6, 2017
Bulk index with java rest client Elasticsearch	5	1318	February 14, 2018
ES Java Bulk Timeout Error Elasticsearch language-clients	2	564	September 6, 2022
Request time-out in bulk update Elasticsearch	1	456	August 7, 2019

How to debug Timeout Error

Related topics