I'm using the latest version of ELK(logstash 5.6.0, elasticsearch 5.6.0), and sometimes i will get the following error message in logstash logfile, but the elasticsearch cluster is really heathly.
here is the error message in logstash:
[2017-11-06T13:03:15,590][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://192.168.234.236:9200/"}
[2017-11-06T13:03:18,906][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://192.168.137.168:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://192.168.137.168:9200/, :error_message=>"Elasticsearch Unreachable: [http://192.168.137.168:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2017-11-06T13:03:18,906][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://192.168.137.168:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>2}
[2017-11-06T13:03:20,594][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://192.168.137.168:9200/, :path=>"/"}
[2017-11-06T13:03:20,597][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://192.168.137.168:9200/"}
[2017-11-06T13:03:22,070][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://192.168.232.105:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://192.168.232.105:9200/, :error_message=>"Elasticsearch Unreachable: [http://192.168.232.105:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2017-11-06T13:03:22,071][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://192.168.232.105:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>4}
sometimes it will continued for several minutes or several hours.
my logstah config is:
input {
kafka {
bootstrap_servers => "zls-hadoop-02:9092,zls-hadoop-03:9092,zls-hadoop-04:9092"
group_id => "logstash_android01"
topics => [ "zls_android01" ]
codec => plain
consumer_threads => 3
decorate_events => true
max_partition_fetch_bytes => "1048576"
session_timeout_ms => "90000"
request_timeout_ms => "100000"
}
}
output {
elasticsearch {
hosts => ["zls-elk-01:9200", "zls-elk-01:9200", "zls-elk-01:9200", "zls-elk-04:9200", "zls-elk-05:9200"]
index => "android01-c_log-%{+YYYY-MM-dd}"
document_type => "%{action_id}"
document_id => "%{logsource}%{logsign}%{logoffset}"
doc_as_upsert => true
flush_size => 4000
idle_flush_time => 5
sniffing => true
template_overwrite => true
}
}
and i get such message from some elasticsearch node:
[2017-11-06T13:50:12,629][INFO ][o.e.m.j.JvmGcMonitorService] [zls-elk-05] [gc][183643] overhead, spent [253ms] collecting in the last [1s]
[2017-11-06T13:50:26,669][INFO ][o.e.m.j.JvmGcMonitorService] [zls-elk-05] [gc][183657] overhead, spent [310ms] collecting in the last [1s]
[2017-11-06T13:53:06,719][INFO ][o.e.m.j.JvmGcMonitorService] [zls-elk-05] [gc][183816] overhead, spent [311ms] collecting in the last [1s]
[2017-11-06T13:53:49,846][INFO ][o.e.m.j.JvmGcMonitorService] [zls-elk-05] [gc][183859] overhead, spent [273ms] collecting in the last [1s]
[2017-11-06T13:54:19,958][INFO ][o.e.m.j.JvmGcMonitorService] [zls-elk-05] [gc][183889] overhead, spent [256ms] collecting in the last [1s]
[2017-11-06T13:55:51,406][INFO ][o.e.m.j.JvmGcMonitorService] [zls-elk-05] [gc][183980] overhead, spent [260ms] collecting in the last [1s]
[2017-11-06T13:57:52,683][INFO ][o.e.m.j.JvmGcMonitorService] [zls-elk-05] [gc][184101] overhead, spent [258ms] collecting in the last [1s]
[2017-11-06T13:58:37,752][INFO ][o.e.m.j.JvmGcMonitorService] [zls-elk-05] [gc][184146] overhead, spent [277ms] collecting in the last [1s]
[2017-11-06T13:59:08,875][INFO ][o.e.m.j.JvmGcMonitorService] [zls-elk-05] [gc][184177] overhead, spent [256ms] collecting in the last [1s]
[2017-11-06T13:59:42,925][INFO ][o.e.m.j.JvmGcMonitorService] [zls-elk-05] [gc][184211] overhead, spent [267ms] collecting in the last [1s]
[2017-11-06T14:00:43,170][INFO ][o.e.m.j.JvmGcMonitorService] [zls-elk-05] [gc][184271] overhead, spent [303ms] collecting in the last [1s]
[2017-11-06T14:01:14,181][INFO ][o.e.m.j.JvmGcMonitorService] [zls-elk-05] [gc][184302] overhead, spent [273ms] collecting in the last [1s]
can anybody tell me why this happend so frequently and how can i fixed it, i'm really appreciate your help.