Hello,
We have 10+ logstash nodes (6.2.4) behind AWS Classic Loadbalancer. Everything was working fine and we started seeing these errors on the nodes. These errors first appeared on some of the nodes while other were working perfectly fine and then it started happening on others as well. Restarting the nodes did the trick for some time but it started happening again in the same pattern.
In the monitoring logs, we see that whenever this situation happens, node is not able to receive any events and JVM usage of that node goes high above 90% (heap size 30GB). Back pressure is put on the input side and ingestion rate goes very low.
[2019-04-04T17:25:00,899][WARN ][logstash.outputs.elasticsearch] UNEXPECTED POOL ERROR {:e=>#<LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError: No Available connections>}
[2019-04-04T17:25:00,899][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch, but no there are no living connections in the connection pool. Perhaps Elasticsearch is unreachable or down? {:error_message=>"No Available connections", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError", :will_retry_in_seconds=>2}
[2019-04-04T17:25:00,902][WARN ][logstash.outputs.elasticsearch] UNEXPECTED POOL ERROR {:e=>#<LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError: No Available connections>}
[2019-04-04T17:25:00,902][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch, but no there are no living connections in the connection pool. Perhaps Elasticsearch is unreachable or down? {:error_message=>"No Available connections", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError", :will_retry_in_seconds=>2}
[2019-04-04T17:25:02,907][WARN ][logstash.outputs.elasticsearch] UNEXPECTED POOL ERROR {:e=>#<LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError: No Available connections>}
[2019-04-04T17:25:02,907][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch, but no there are no living connections in the connection pool. Perhaps Elasticsearch is unreachable or down? {:error_message=>"No Available connections", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError", :will_retry_in_seconds=>4}
[2019-04-04T17:25:02,910][WARN ][logstash.outputs.elasticsearch] UNEXPECTED POOL ERROR {:e=>#<LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError: No Available connections>}
[2019-04-04T17:25:02,910][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch, but no there are no living connections in the connection pool. Perhaps Elasticsearch is unreachable or down? {:error_message=>"No Available connections", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError", :will_retry_in_seconds=>4}
[2019-04-04T17:25:03,288][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>https://logstash:xxxxxx@elasticsearch:9200/, :path=>"/"}
[2019-04-04T17:25:03,297][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"https://logstash:xxxxxx@elastic:9200/"}
[2019-04-04T17:25:04,474][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [https://logstash:xxxxxx@elastic:9200/][Manticore::SocketException] Broken pipe (Write failed) {:url=>https://logstash:xxxxxx@elastic:9200/, :error_message=>"Elasticsearch Unreachable: [https://logstash:xxxxxx@elastic:9200/][Manticore::SocketException] Broken pipe (Write failed)", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2019-04-04T17:25:04,474][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [https://logstash:xxxxxx@elastic:9200/][Manticore::SocketException] Broken pipe (Write failed)", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>64}
[2019-04-04T17:25:06,916][WARN ][logstash.outputs.elasticsearch] UNEXPECTED POOL ERROR {:e=>#<LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError: No Available connections>}
[2019-04-04T17:25:06,916][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch, but no there are no living connections in the connection pool. Perhaps Elasticsearch is unreachable or down? {:error_message=>"No Available connections", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError", :will_retry_in_seconds=>8}
[2019-04-04T17:25:06,918][WARN ][logstash.outputs.elasticsearch] UNEXPECTED POOL ERROR {:e=>#<LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError: No Available connections>}
[2019-04-04T17:25:06,918][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch, but no there are no living connections in the connection pool. Perhaps Elasticsearch is unreachable or down? {:error_message=>"No Available connections", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError", :will_retry_in_seconds=>8}
[2019-04-04T17:25:08,299][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>https://logstash:xxxxxx@elastic:9200/, :path=>"/"}
I searched for this issue and found github issues related to this (https://github.com/logstash-plugins/logstash-output-elasticsearch/issues/793 , https://github.com/logstash-plugins/logstash-output-elasticsearch/issues/729 ) which state that there could be a problem with load balancer terminating connection when request payload goes high returning 413 response code. So I checked AWS ELB logs and found 413 errors
2019-04-04T13:20:15.195906Z Elasticsearch-ELB <LS-IP>:<LS-PORT> <ELB-IP>:9200 0.000026 0.000805 0.000024 413 413 16384 0 "POST https://elastic:9200/_bulk HTTP/1.1" "Manticore 0.6.4" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
AWS ELB Log pattern
timestamp elb client:port backend:port request_processing_time backend_processing_time response_processing_time elb_status_code backend_status_code received_bytes sent_bytes "request" "user_agent" ssl_cipher ssl_protocol
Checked Elasticsearch logs. Found nothing in Elasticsearch logs! No bulk rejections!
received_bytes is not that high, right? yet we get this response code and logstash loses the connection. Is this something related? Need a help in debugging the problem.