Failed to publish events, write queue "completed" not going down, resource use low

I'm using cloud.elastic.co - and seeing:

2018-09-15T19:21:08.717Z ERROR pipeline/output.go:92 Failed to publish events: 500 Internal Server Error: {"took":4,"ignored":false,"errors":true,"error":{"type":"export_exception","reason":"Exception when closing export bulk","caused_by":{"type":"export_exception","reason":"failed to flush export bulks","caused_by":{"type":"export_exception","reason":"bulk [found-user-defined] reports failures when exporting documents","exceptions":[{"type":"export_exception","reason":"RemoteTransportException[[instance-0000000016][172.17.0.9:19758][indices:data/write/bulk[s]]]; nested: RemoteTransportException[[instance-0000000016][xxxx][indices:data/write/bulk[s][p]]]; nested: EsRejectedExecutionException[rejected execution of org.elasticsearch.transport.TransportService$7@2739b036 on EsThreadPoolExecutor[name = instance-0000000016/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@b4e3cce[Running, pool size = 2, active threads = 2, queued tasks = 200, completed tasks = 10238]]];","caused_by":{"type":"es_rejected_execution_exception","reason":"rejected execution of org.elasticsearch.transport.TransportService$7@2739b036 on EsThreadPoolExecutor[name = instance-0000000016/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@b4e3cce[Running, pool size = 2, active threads = 2, queued tasks = 200, completed tasks = 10238]]"}}]}}}}

...repeated many times over in filebeat (raw JSON) logs. The whole setup has been running fine for a few months. The "completed tasks" number goes up occasionally, but not by much.

CPU use is ~10%, RAM ~50%. Search response times <200ms, index response times timing out (~300kms). I don't know how many cores the 2 nodes have - elastic.cloud.co doesn't show me that (who thought that was a good idea?).

How can I diagnose this? (Better still, how can I fix this, of course :))

Update; monitoring data isn't showing up anywhere. I've created a separate monitoring cluster, but using Kibana monitoring on either cluster doesn't show monitoring data for the problem one.**

Bonus question: should I upgrade to 6.4? It's got a warning about a Transport client, but I've no idea if filebeat, metricbeat, etc, use that or not.

** - I get " You need to make some adjustments

To run monitoring please perform the following steps

We checked the cluster persistent settings for xpack.monitoring.exporters , and found the reason: Remote exporters indicate a possible misconfiguration: found-user-defined .

Using monitoring exporters ship the monitoring data to a remote monitoring cluster is highly recommended as it keeps the integrity of the monitoring data safe no matter what the state of the production cluster. However, as this instance of Kibana could not find any monitoring data, there seems to be a problem with the xpack.monitoring.exporters configuration, or the xpack.monitoring.elasticsearch settings in kibana.yml .

Check that the intended exporters are enabled for sending statistics to the monitoring cluster, and that the monitoring cluster host matches the xpack.monitoring.elasticsearch setting in kibana.yml to see monitoring data in this instance of Kibana."

I've found that there are huge amounts of "put_mapping" tasks going on. They look like this (this is detailed output):

    "shLb94reTUej2x3wvrhRNw:176978": {
      "node": "shLb94reTUej2x3wvrhRNw",
      "id": 176978,
      "type": "transport",
      "action": "indices:admin/mapping/put",
      "description": "",
      "start_time_in_millis": 1537045832428,
      "running_time_in_nanos": 76731328,
      "cancellable": false,
      "headers": {}
    }

How can I find what is causing these?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.