Hello,
We are running ELK cluster on ec2 instance.
Elasticsearch cluster in docker on ec2 instances in 3 regions. 33 nodes at total (i3.2xlarge for hot nodes and m5.4xlarge for warm nodes, 25TB data at total, ~2TB data each day)
2 hours ago we started to get connection error on logstash side. Logstash can't send data to elasticsearch via port 9200.
No ssl used. Logstash instances sends data to elasticsearch nodes only in same region.
Logstash instance lose connectivity, but we are able to perform telnet check successfully (telnet s1infra-esnode-us-1.s1.guru 9200)
There is no any error on side of elasticsearch application + no errors in syslog.
All ELK stack is running in docker.
Logstash version 6.2.4
Ubuntu: 16.04
output {
if [anchor] == "operations_log" {
elasticsearch {
id => "operations_output01"
hosts => {{ elasticsearch_output_hosts }}
user => {{ elasticsearch_output_user }}
password => {{ elasticsearch_output_password }}
index => "write_s1-operations-%{+YYYY.MM}"
document_id => "%{log_id}"
manage_template => false
}
}
else if [anchor] == "aws_billing" {
elasticsearch {
id => "operations_output02"
hosts => {{ elasticsearch_output_hosts }}
user => {{ elasticsearch_output_user }}
password => {{ elasticsearch_output_password }}
index => "aws_billing-%{+YYYY.MM}"
document_id => "%{fingerprint}"
manage_template => false
}
}
else {
elasticsearch {
id => "logstash_output01"
index => "write_s1-logstash-%{+YYYY.MM.dd}"
hosts => {{ elasticsearch_output_hosts }}
user => {{ elasticsearch_output_user }}
password => {{ elasticsearch_output_password }}
manage_template => false
}
}
}
Errors:
[2018-07-02T11:44:28,020][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://logstash:xxxxxx@s1infra-esnode-eu-1.s1.guru:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://logstash:xxxxxx@s1infra-esnode-eu-1.s1.guru:9200/, :error_message=>"Elasticsearch Unreachable: [http://logstash:xxxxxx@s1infra-esnode-eu-1.s1.guru:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2018-07-02T11:44:28,020][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://logstash:xxxxxx@s1infra-esnode-eu-1.s1.guru:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>2}
[2018-07-02T11:44:30,409][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://logstash:xxxxxx@s1infra-esnode-eu-6.s1.guru:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://logstash:xxxxxx@s1infra-esnode-eu-6.s1.guru:9200/, :error_message=>"Elasticsearch Unreachable: [http://logstash:xxxxxx@s1infra-esnode-eu-6.s1.guru:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2018-07-02T11:44:30,409][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://logstash:xxxxxx@s1infra-esnode-eu-6.s1.guru:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>2}
[2018-07-02T11:44:32,355][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://logstash:xxxxxx@s1infra-esnode-eu-1.s1.guru:9200/, :path=>"/"}
[2018-07-02T11:44:32,357][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://logstash:xxxxxx@s1infra-esnode-eu-1.s1.guru:9200/"}
[2018-07-02T11:44:32,358][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://logstash:xxxxxx@s1infra-esnode-eu-6.s1.guru:9200/, :path=>"/"}
[2018-07-02T11:44:32,360][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://logstash:xxxxxx@s1infra-esnode-eu-6.s1.guru:9200/"}
[2018-07-02T11:44:33,499][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://logstash:xxxxxx@s1infra-esnode-eu-1.s1.guru:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://logstash:xxxxxx@s1infra-esnode-eu-1.s1.guru:9200/, :error_message=>"Elasticsearch Unreachable: [http://logstash:xxxxxx@s1infra-esnode-eu-1.s1.guru:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2018-07-02T11:44:33,499][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://logstash:xxxxxx@s1infra-esnode-eu-1.s1.guru:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>2}
All "limits" checked.
No IO disk issues found.
Do some one has idea where ti debug?