Timeouts when indexing

Dear All,

I am using ES for logging requests/responses to an external API. In case of a problem, these logs are searched to resolve the issue.
99% of requests to ES are index/update queries.
Most of the time it works, but sometimes (3-5 times per 24h) the ES server is not responsing within 3 seconds (current limit for timeout).
The timeouts mean lost data, which is not acceptable. On the other hand increasing timeout to more than 3 seconds is also not acceptable. In case of network issues and unreachability of ES from the web server - the request to ES should timeout relatively fast and the website should be usable. I would even prefer to set it at 1s.

After studying information from the Monitoring - I discovered that the timeouts correlate with spikes in:

  • Indexing Time - jump from less than 1s to over 6s
    10:06
    indexing-time
  • Refresh Time - jump from around 1s to over 5s
    10:06
    refresh-time
  • Request Time - jump from ~0 to 6000ms
    request-time
    10:06

and a little bit in with:

  • Indexing Latency
    10:06
    indexing-latency
  • Latency
    10:06
    latency

The times of spikes are random. The CPU or Memory usage does not spike.
Full monitoring below:
Overview 10:06


Index Overview 10:06

Index Advanced 10:06

Node Overview 10:06

Node Advanced 10:06

So far I changed with no success:

indices.memory.index_buffer_size: 50%
thread_pool.index.queue_size: 5000

On index creation:
refresh_interval: 30s

The index options:

{
	"state": "open",
	"settings": {
	    "index": {
	        "refresh_interval": "30s",
	        "number_of_shards": "1",
	        "provided_name": "requests_2018-03-19",
	        "max_result_window": "1000000",
	        "creation_date": "1521414001578",
	        "number_of_replicas": "0",
	        "uuid": "xd75vemBTdmZS6Us6RT0Dw",
	        "version": {
	            "created": "5060599"
	        }
	    }
	}
}

Machine is: 4 cores/8GB RAM/SSD

Do you have any ideas what else I could tune to overcome the timeout issue?
All the indexed documents are similar in size. For every index operation (API request) there usually is also a following update operation (API response). I cannot understand the reason for the spikes. I also wonder why Index Queue is always 0.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.