I'm using cloud.elastic.co - and seeing:
2018-09-15T19:21:08.717Z ERROR pipeline/output.go:92 Failed to publish events: 500 Internal Server Error: {"took":4,"ignored":false,"errors":true,"error":{"type":"export_exception","reason":"Exception when closing export bulk","caused_by":{"type":"export_exception","reason":"failed to flush export bulks","caused_by":{"type":"export_exception","reason":"bulk [found-user-defined] reports failures when exporting documents","exceptions":[{"type":"export_exception","reason":"RemoteTransportException[[instance-0000000016][172.17.0.9:19758][indices:data/write/bulk[s]]]; nested: RemoteTransportException[[instance-0000000016][xxxx][indices:data/write/bulk[s][p]]]; nested: EsRejectedExecutionException[rejected execution of org.elasticsearch.transport.TransportService$7@2739b036 on EsThreadPoolExecutor[name = instance-0000000016/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@b4e3cce[Running, pool size = 2, active threads = 2, queued tasks = 200, completed tasks = 10238]]];","caused_by":{"type":"es_rejected_execution_exception","reason":"rejected execution of org.elasticsearch.transport.TransportService$7@2739b036 on EsThreadPoolExecutor[name = instance-0000000016/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@b4e3cce[Running, pool size = 2, active threads = 2, queued tasks = 200, completed tasks = 10238]]"}}]}}}}
...repeated many times over in filebeat (raw JSON) logs. The whole setup has been running fine for a few months. The "completed tasks" number goes up occasionally, but not by much.
CPU use is ~10%, RAM ~50%. Search response times <200ms, index response times timing out (~300kms). I don't know how many cores the 2 nodes have - elastic.cloud.co doesn't show me that (who thought that was a good idea?).
How can I diagnose this? (Better still, how can I fix this, of course :))
Update; monitoring data isn't showing up anywhere. I've created a separate monitoring cluster, but using Kibana monitoring on either cluster doesn't show monitoring data for the problem one.**
Bonus question: should I upgrade to 6.4? It's got a warning about a Transport client, but I've no idea if filebeat, metricbeat, etc, use that or not.
** - I get " You need to make some adjustments
To run monitoring please perform the following steps
We checked the cluster persistent
settings for xpack.monitoring.exporters
, and found the reason: Remote exporters indicate a possible misconfiguration: found-user-defined
.
Using monitoring exporters ship the monitoring data to a remote monitoring cluster is highly recommended as it keeps the integrity of the monitoring data safe no matter what the state of the production cluster. However, as this instance of Kibana could not find any monitoring data, there seems to be a problem with the xpack.monitoring.exporters
configuration, or the xpack.monitoring.elasticsearch
settings in kibana.yml
.
Check that the intended exporters are enabled for sending statistics to the monitoring cluster, and that the monitoring cluster host matches the xpack.monitoring.elasticsearch
setting in kibana.yml
to see monitoring data in this instance of Kibana."