I have three nodes on which I have metricbeat installed. In addition, on one of these nodes I have elasticsearch (6.1.1) and kibana receiving the collected measures. After restarting the metricbeat instance on one node, it was unable to connect to elasticsearch anymore whereas the others were still working and able to connect after a restart.
On elasticsearch log I'm getting a:
exception while handling client http traffic, closing connection Connexion ré-initialisée par le correspondant from the failling connection attempt from metricbeat.
From the metricbeat log:
ERR Failed to perform any bulk index operations: Post http://18.104.22.168:9200/_bulk: net/http: request canceled (Client.Timeout exceeded while awaiting headers) 2018/02/09 14:06:55.991855 output.go:92: ERR Failed to publish events: Post http://22.214.171.124:9200/_bulk: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
If I try
curl -XGET http://126.96.36.199:9200/ it works on all nodes.
If I try
curl -XGET http://188.8.131.52:9200/_cat/indices?
the node with the failing metricbeat instance timeout whereas I can successfully list the indices from the other nodes...
I think it started with this exception on elasticsearch:
Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.common.util.concurrent.TimedRunnable@26c77f8 on QueueResizingEsThreadPoolExecutor[search, queue capacity = 1000, min queue capacity = 1000, max queue capacity = 1000, frame size = 2000, targeted response rate = 1s, task execution EWMA = 19.4ms, adjustment amount = 50, QueueResizingEsThreadPoolExecutor[search, queue capacity = 1000, org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutor@3a28de18[Running, pool size = 7, active threads = 7, queued tasks = 1180, completed tasks = 42882934]]]
Any clue of what is happening ?