Hi,
I have three logstashes processing logs and shipping to an elasticsearch 6.8.2 cluster with 18 data nodes via an haproxy loadbalancer.
I regularly see the following errors in the logstash logs:
[2019-09-05T13:39:25,126][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://loadbalancer/][Manticore::SocketTimeout] Read timed out {:url=>http://loadbalancer/, :error_message=>"Elasticsearch Unreachable: [http://loadbalancer/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
However, that same loadbalancer/cluster is processing around 50k msg/sec (primaries) from fluentd running elsewhere on the same network. So, the cluster is not unreachable. I'm able to query it directly from the logstash node. And I'm also not seeing high utilisation of the bulk queue or any bulk rejections in elasticsearch.
The logstash output config looks like this:
hosts => ["loadbalancer"]
index => "%{[@metadata][index]}-%{+YYYY.MM.dd}"
document_id => "%{[@metadata][doc_id]}"
manage_template => false
pipeline => "%{[@metadata][pipeline]}"
codec => json { charset => "UTF-8" }
template_name => "%{[@metadata][template]}"
template_overwrite => false
retry_max_interval => 30
timeout => 3
}
Logstash ships its logs to the same pipeline and uses the same template as that used by fluentd.
Recently, we've started seeing logstash stop sending at all to elasticsearch. It will start sending again after a restart only to stop shipping again.
How can I understand what the issue is here? And how can I resolve it?
Unfortunately, I can't ship directly to the nodes because of our security policies. However, I have done that in the past and it made no difference. However, we weren't seeing transmission halt completely.
Regards,
D