Dear All,
I am using ES for logging requests/responses to an external API. In case of a problem, these logs are searched to resolve the issue.
99% of requests to ES are index/update queries.
Most of the time it works, but sometimes (3-5 times per 24h) the ES server is not responsing within 3 seconds (current limit for timeout).
The timeouts mean lost data, which is not acceptable. On the other hand increasing timeout to more than 3 seconds is also not acceptable. In case of network issues and unreachability of ES from the web server - the request to ES should timeout relatively fast and the website should be usable. I would even prefer to set it at 1s.
After studying information from the Monitoring - I discovered that the timeouts correlate with spikes in:
- Indexing Time - jump from less than 1s to over 6s
10:06
- Refresh Time - jump from around 1s to over 5s
10:06
- Request Time - jump from ~0 to 6000ms
10:06
and a little bit in with:
- Indexing Latency
10:06
- Latency
10:06
The times of spikes are random. The CPU or Memory usage does not spike.
Full monitoring below:
Overview 10:06
Index Overview 10:06
Index Advanced 10:06
Node Overview 10:06
Node Advanced 10:06
So far I changed with no success:
indices.memory.index_buffer_size: 50%
thread_pool.index.queue_size: 5000
On index creation:
refresh_interval: 30s
The index options:
{
"state": "open",
"settings": {
"index": {
"refresh_interval": "30s",
"number_of_shards": "1",
"provided_name": "requests_2018-03-19",
"max_result_window": "1000000",
"creation_date": "1521414001578",
"number_of_replicas": "0",
"uuid": "xd75vemBTdmZS6Us6RT0Dw",
"version": {
"created": "5060599"
}
}
}
}
Machine is: 4 cores/8GB RAM/SSD
Do you have any ideas what else I could tune to overcome the timeout issue?
All the indexed documents are similar in size. For every index operation (API request) there usually is also a following update operation (API response). I cannot understand the reason for the spikes. I also wonder why Index Queue is always 0.