Hello Team,
We built a new Elastic cluster with 3 masters and 2 data nodes with 2 logstash nodes.
On the logstash nodes, we are seeing these errors when pushing the logs. It disconnects from the Elastic cluster and connects back within seconds but happens every now and then. Please see the logstash configuration and errors.
# cat filter.conf
filter {
if [headers][http_version] {
drop{}
}
if [apic_cloud] {
mutate {
add_field => [ "[@metadata][index_type]", "apic" ]
}
} else {
mutate {
add_field => [ "[@metadata][index_type]", "dp" ]
}
}
}
# cat input_rabbitmq.conf
input {
rabbitmq {
queue => "apic"
host => ["88535.d.net", "88534.d.net", "88537.d.net", "88536.d.net"]
exchange => "syslogs"
durable => true
user => "dplogs"
password => "JTu7Q7}M"
exchange_type => "direct"
key => "apic"
ack => true
prefetch_count => 300
arguments => {
"x-queue-type" => "classic"
}
}
}
output {
elasticsearch {
manage_template => false
sniffing => false
index => "%{[@metadata][index_type]}-%{+YYYY.MM.dd}"
document_type => "_doc"
cacert => "/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem"
hosts => ["https://93801.d.net:9200", "https://93806.d.net:9200"]
password => "*************"
user => "C013020"
}
}
[2020-04-08T00:54:55,889][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnr
eachableError] Elasticsearch Unreachable: [https://C013020:xxxxxx@93806.d.net:9200/][Manticore::SocketTimeout] Read timed out {:url=>https://C013020:
xxxxxx@93806.d.net:9200/, :error_message=>"Elasticsearch Unreachable: [https://C013020:xxxxxx@93806.d.net:9200/][Manticore::SocketTimeout] R
ead timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2020-04-08T00:54:55,890][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachabl
e or down! {:error_message=>"Elasticsearch Unreachable: [https://C013020:xxxxxx@93806.d.net:9200/][Manticore::SocketTimeout] Read timed out", :class=
>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>2}
[2020-04-08T00:54:55,936][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"https://C013020:xxxxxx@93806.d.net:9200/"
}
[2020-04-08T00:54:56,162][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnr
eachableError] Elasticsearch Unreachable: [https://C013020:xxxxxx@93801.d.net:9200/][Manticore::SocketTimeout] Read timed out {:url=>https://C013020:
xxxxxx@93801.d.net:9200/, :error_message=>"Elasticsearch Unreachable: [https://C013020:xxxxxx@93801.d.net:9200/][Manticore::SocketTimeout] R
ead timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2020-04-08T00:54:56,162][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachabl
e or down! {:error_message=>"Elasticsearch Unreachable: [https://C013020:xxxxxx@93801.d.net:9200/][Manticore::SocketTimeout] Read timed out", :class=
>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>2}
[2020-04-08T00:55:00,949][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"https://C013020:xxxxxx@93801.d.net:9200/"
}
The strange thing is there are no fixed intervals in which this happens, its quite random.
Sometimes we see timed out errors on the elasticsearch nodes but not always. Not sure whether they are related.
[2020-04-08T00:00:06,295][INFO ][o.e.c.m.MetaDataMappingService] [94977] [apic-2020.04.07/6EWlCFJbTDuaaVybBTePMw] update_mapping [_doc]
[2020-04-08T00:44:43,051][ERROR][o.e.x.m.c.i.IndexRecoveryCollector] [94977] collector [index_recovery] timed out when collecting data
[2020-04-08T00:44:53,052][ERROR][o.e.x.m.c.i.IndexStatsCollector] [94977] collector [index-stats] timed out when collecting data
[2020-04-08T00:45:03,052][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [94977] collector [cluster_stats] timed out when collecting data
[2020-04-08T00:45:08,771][WARN ][o.e.c.InternalClusterInfoService] [94977] Failed to update node information for ClusterInfoUpdateJob within 15s timeout
[2020-04-08T00:45:08,771][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [94977] failed to execute on node [LisQzhzrRfWq4dmGBpJqzQ]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [93801][10.149.36.226:9300][cluster:monitor/nodes/stats[n]] request_id [325901] timed out after
[15008ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:1016) [elasticsearch-6.8.3.jar:6.8.3]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) [elasticsearch-6.8.3.jar:6.8.3]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_242]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_242]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_242]
[2020-04-08T00:45:23,053][ERROR][o.e.x.m.c.i.IndexRecoveryCollector] [94977] collector [index_recovery] timed out when collecting data
[2020-04-08T00:45:23,772][WARN ][o.e.c.InternalClusterInfoService] [94977] Failed to update shard information for ClusterInfoUpdateJob within 15s timeout
[2020-04-08T00:45:33,054][ERROR][o.e.x.m.c.i.IndexStatsCollector] [94977] collector [index-stats] timed out when collecting data
[2020-04-08T00:45:43,054][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [94977] collector [cluster_stats] timed out when collecting data
[2020-04-08T00:45:51,337][WARN ][o.e.t.TransportService ] [94977] Received response for a request that has timed out, sent [57631ms] ago, timed out [42623m
s] ago, action [cluster:monitor/nodes/stats[n]], node [{93801}{LisQzhzrRfWq4dmGBpJqzQ}{CLBgrXDlTwW5TDDs90sx0g}{10.149.36.226}{10.149.36.226:9300}{ml.machine_
memory=67386937344, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}], id [325901]
[2020-04-08T00:45:51,393][INFO ][o.e.c.m.MetaDataIndexTemplateService] [94977] adding template [.management-beats] for index patterns [.management-beats]
[2020-04-08T00:45:51,402][INFO ][o.e.c.m.MetaDataIndexTemplateService] [94977] adding template [.management-beats] for index patterns [.management-beats]
[2020-04-08T00:56:43,066][ERROR][o.e.x.m.c.i.IndexRecoveryCollector] [94977] collector [index_recovery] timed out when collecting data
[2020-04-08T00:56:53,066][ERROR][o.e.x.m.c.i.IndexStatsCollector] [94977] collector [index-stats] timed out when collecting data
[2020-04-08T02:00:00,444][INFO ][o.e.c.m.MetaDataCreateIndexService] [94977] [apic-2020.04.08] creating index, cause [auto(bulk api)], templates [controlapic
logs], shards [5]/[1], mappings [_doc]
[2020-04-08T02:00:00,600][INFO ][o.e.c.m.MetaDataMappingService] [94977] [apic-2020.04.08/w1DNEzD4RgCs8YHPwkbDWw] update_mapping [_doc]
Requesting you to take a look at this and help us in identifying the issue and fix it.
Regards,
Vibin