Failed to publish events, write queue "completed" not going down, resource use low

Kieren_Johnstone · September 15, 2018, 7:28pm

I'm using cloud.elastic.co - and seeing:

2018-09-15T19:21:08.717Z ERROR pipeline/output.go:92 Failed to publish events: 500 Internal Server Error: {"took":4,"ignored":false,"errors":true,"error":{"type":"export_exception","reason":"Exception when closing export bulk","caused_by":{"type":"export_exception","reason":"failed to flush export bulks","caused_by":{"type":"export_exception","reason":"bulk [found-user-defined] reports failures when exporting documents","exceptions":[{"type":"export_exception","reason":"RemoteTransportException[[instance-0000000016][172.17.0.9:19758][indices:data/write/bulk[s]]]; nested: RemoteTransportException[[instance-0000000016][xxxx][indices:data/write/bulk[s][p]]]; nested: EsRejectedExecutionException[rejected execution of org.elasticsearch.transport.TransportService$7@2739b036 on EsThreadPoolExecutor[name = instance-0000000016/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@b4e3cce[Running, pool size = 2, active threads = 2, queued tasks = 200, completed tasks = 10238]]];","caused_by":{"type":"es_rejected_execution_exception","reason":"rejected execution of org.elasticsearch.transport.TransportService$7@2739b036 on EsThreadPoolExecutor[name = instance-0000000016/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@b4e3cce[Running, pool size = 2, active threads = 2, queued tasks = 200, completed tasks = 10238]]"}}]}}}}

...repeated many times over in filebeat (raw JSON) logs. The whole setup has been running fine for a few months. The "completed tasks" number goes up occasionally, but not by much.

CPU use is ~10%, RAM ~50%. Search response times <200ms, index response times timing out (~300kms). I don't know how many cores the 2 nodes have - elastic.cloud.co doesn't show me that (who thought that was a good idea?).

How can I diagnose this? (Better still, how can I fix this, of course :))

Update; monitoring data isn't showing up anywhere. I've created a separate monitoring cluster, but using Kibana monitoring on either cluster doesn't show monitoring data for the problem one.**

Bonus question: should I upgrade to 6.4? It's got a warning about a Transport client, but I've no idea if filebeat, metricbeat, etc, use that or not.

** - I get " You need to make some adjustments

To run monitoring please perform the following steps

We checked the cluster persistent settings for xpack.monitoring.exporters , and found the reason: Remote exporters indicate a possible misconfiguration: found-user-defined .

Using monitoring exporters ship the monitoring data to a remote monitoring cluster is highly recommended as it keeps the integrity of the monitoring data safe no matter what the state of the production cluster. However, as this instance of Kibana could not find any monitoring data, there seems to be a problem with the xpack.monitoring.exporters configuration, or the xpack.monitoring.elasticsearch settings in kibana.yml .

Check that the intended exporters are enabled for sending statistics to the monitoring cluster, and that the monitoring cluster host matches the xpack.monitoring.elasticsearch setting in kibana.yml to see monitoring data in this instance of Kibana."

Kieren_Johnstone · September 15, 2018, 9:12pm

I've found that there are huge amounts of "put_mapping" tasks going on. They look like this (this is detailed output):

    "shLb94reTUej2x3wvrhRNw:176978": {
      "node": "shLb94reTUej2x3wvrhRNw",
      "id": 176978,
      "type": "transport",
      "action": "indices:admin/mapping/put",
      "description": "",
      "start_time_in_millis": 1537045832428,
      "running_time_in_nanos": 76731328,
      "cancellable": false,
      "headers": {}
    }

How can I find what is causing these?

system · October 13, 2018, 9:22pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Issue with thread_pool.write.queue_size Elasticsearch	13	397	April 9, 2024
Any idea what is causing this error Elastic 2.4.2 Elasticsearch	6	959	December 30, 2016
Rejected execution of org.elasticsearch.transport.TransportService Elasticsearch	2	7043	August 9, 2017
Unexpected error while indexing monitoring document, lots of queued tasks Elasticsearch	2	1439	December 11, 2019
Any idea what these errors mean version 2.4.2 Elasticsearch	8	8413	February 2, 2017

Failed to publish events, write queue "completed" not going down, resource use low

Related Topics