Logstash lose connection with elasticsearch

docker

(saga-nik) #1

Hi guess!
I have problem with logstash in docker.

Sometimes my logstash container lose connection with all elasticsearch master node's.

I see in logstash logs this message:

Logstash logs
 [2018-12-06T07:04:24,880][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://x.x.x.x:28001/][Manticore::SocketTimeout] Read timed out {:url=>http://x.x.x.x:28001/, :error_message=>"Elasticsearch Unreachable: [http://x.x.x.x:28001/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
 [2018-12-06T07:04:24,880][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://x.x.x.x:28001/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>2}
 [2018-12-06T07:04:24,995][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://x.x.x.x:28001/][Manticore::SocketTimeout] Read timed out {:url=>http://x.x.x.x:28001/, :error_message=>"Elasticsearch Unreachable: [http://x.x.x.x:28001/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
 [2018-12-06T07:04:24,996][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://x.x.x.x:28001/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>2}
 [2018-12-06T07:06:05,297][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://x.x.x.x:28001/][Manticore::SocketTimeout] Read timed out {:url=>http://x.x.x.x:28001/, :error_message=>"Elasticsearch Unreachable: [http://x.x.x.x:28001/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
 [2018-12-06T07:06:05,298][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://x.x.x.x:28001/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>2}
 [2018-12-06T07:06:06,121][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://x.x.x.x:28001/][Manticore::SocketTimeout] Read timed out {:url=>http://x.x.x.x:28001/, :error_message=>"Elasticsearch Unreachable: [http://x.x.x.x:28001/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
 [2018-12-06T07:06:06,121][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://x.x.x.x:28001/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>2}
 [2018-12-06T07:06:07,380][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://x.x.x.x:28001/][Manticore::SocketTimeout] Read timed out {:url=>http://x.x.x.x:28001/, :error_message=>"Elasticsearch Unreachable: [http://x.x.x.x:28001/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
 [2018-12-06T07:06:07,380][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://x.x.x.x:28001/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>2}
 [2018-12-06T07:06:07,476][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://x.x.x.x:28001/][Manticore::SocketTimeout] Read timed out {:url=>http://x.x.x.x:28001/, :error_message=>"Elasticsearch Unreachable: [http://x.x.x.x:28001/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
 [2018-12-06T07:06:07,476][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://x.x.x.x:28001/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>2}

I have cluster with 7 container on 3 host:

Cluster
HOST1
1. Master x1
2. Data x1

HOST2
1. Master x1
2. Data x1

HOST3
1. Master x1
2. Ingest x1
3. Coordinating x1
4. Kibana x1
5. Logstash x1

Have any idea?


(saga-nik) #2

Logs from one master node:

Master log

[2018-12-06T07:04:37,119][DEBUG][o.e.i.r.PeerRecoveryTargetService] [data-s1] [index-name-2018.12.06][0] recovery done from [{data-s2}{55-3UT8GT92hh-TcE2qbsQ}{VYPCWJTlTlWgOPUCmiANVQ}{x.x.x.x}{x.x.x.x:28004}{ml.machine_memory=23622320128, ml.max_open_jobs=20, xpack.installed=true, box_type=hot, ml.enabled=true}], took [564ms]
[2018-12-06T07:04:37,875][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [master-s1] collector [cluster_stats] timed out when collecting data
[2018-12-06T07:04:38,299][INFO ][o.e.m.j.JvmGcMonitorService] [master-s1] [gc][575319] overhead, spent [250ms] collecting in the last [1s]
[2018-12-06T07:05:00,096][INFO ][o.e.c.r.a.AllocationService] [master-s1] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[index-name-2018.12.06][0]] ...]).
[2018-12-06T07:05:16,052][INFO ][o.e.c.m.MetaDataCreateIndexService] [master-s1] [index-name-2018.12.06] creating index, cause [auto(bulk api)], templates [default_template, ems], shards [2]/[1], mappings
[2018-12-06T07:05:16,326][INFO ][o.e.m.j.JvmGcMonitorService] [master-s1] [gc][575357] overhead, spent [256ms] collecting in the last [1s]
[2018-12-06T07:05:18,591][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [master-s1] collector [cluster_stats] timed out when collecting data
[2018-12-06T07:05:48,347][INFO ][o.e.m.j.JvmGcMonitorService] [master-s1] [gc][575389] overhead, spent [299ms] collecting in the last [1s]
[2018-12-06T07:06:01,020][DEBUG][o.e.i.r.PeerRecoveryTargetService] [data-s1] [index-name-2018.12.06][0] recovery done from [{data-s2}{55-3UT8GT92hh-TcE2qbsQ}{VYPCWJTlTlWgOPUCmiANVQ}{x.x.x.x}{x.x.x.x:28004}{ml.machine_memory=23622320128, ml.max_open_jobs=20, xpack.installed=true, box_type=hot, ml.enabled=true}], took [71ms]
[2018-12-06T07:06:01,032][DEBUG][o.e.i.r.PeerRecoveryTargetService] [data-s1] [index-name-2018.12.06][0] recovery done from [{data-s2}{55-3UT8GT92hh-TcE2qbsQ}{VYPCWJTlTlWgOPUCmiANVQ}{x.x.x.x}{x.x.x.x:28004}{ml.machine_memory=23622320128, ml.max_open_jobs=20, xpack.installed=true, box_type=hot, ml.enabled=true}], took [78ms]
[2018-12-06T07:06:14,746][DEBUG][o.e.i.r.RecoveryTarget ] [data-s1] [index-name-2018.12.06][0] reset of recovery with shard [index-name-2018.12.06][0] and id [1494]
[2018-12-06T07:06:14,888][DEBUG][o.e.i.r.PeerRecoveryTargetService] [data-s1] [index-name-2018.12.06][0] recovery done from [{data-s2}{55-3UT8GT92hh-TcE2qbsQ}{VYPCWJTlTlWgOPUCmiANVQ}{x.x.x.x}{x.x.x.x:28004}{ml.machine_memory=23622320128, ml.max_open_jobs=20, xpack.installed=true, box_type=hot, ml.enabled=true}], took [157ms]
[2018-12-06T07:06:15,398][DEBUG][o.e.i.r.PeerRecoveryTargetService] [data-s1] [index-name-2018.12.06][0] recovery done from [{data-s2}{55-3UT8GT92hh-TcE2qbsQ}{VYPCWJTlTlWgOPUCmiANVQ}{x.x.x.x}{x.x.x.x:28004}{ml.machine_memory=23622320128, ml.max_open_jobs=20, xpack.installed=true, box_type=hot, ml.enabled=true}], took [673ms]
[2018-12-06T07:06:28,391][DEBUG][o.e.i.r.RecoverySourceHandler] [data-s1] [index-name-2018.12.06][1][recover to data-s2] delaying recovery of [index-name-2018.12.06][1] as it is not listed as assigned to target node {data-s2}{55-3UT8GT92hh-TcE2qbsQ}{VYPCWJTlTlWgOPUCmiANVQ}{x.x.x.x}{x.x.x.x:28004}{ml.machine_memory=23622320128, ml.max_open_jobs=20, xpack.installed=true, box_type=hot, ml.enabled=true}
[2018-12-06T07:06:28,401][DEBUG][o.e.i.r.RecoverySourceHandler] [data-s1] [index-name-2018.12.06][1][recover to data-s2] delaying recovery of [index-name-2018.12.06][1] as it is not listed as assigned to target node {data-s2}{55-3UT8GT92hh-TcE2qbsQ}{VYPCWJTlTlWgOPUCmiANVQ}{x.x.x.x}{x.x.x.x:28004}{ml.machine_memory=23622320128, ml.max_open_jobs=20, xpack.installed=true, box_type=hot, ml.enabled=true}
[2018-12-06T07:06:39,596][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [master-s1] collector [cluster_stats] timed out when collecting data
[2018-12-06T07:06:42,437][INFO ][o.e.c.r.a.AllocationService] [master-s1] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[index-name-2018.12.06][0], [index-name-2018.12.06][1], [index-name-2018.12.06][1]] ...]).
[2018-12-06T07:07:00,393][ERROR][o.e.x.m.c.i.IndexStatsCollector] [master-s1] collector [index-stats] timed out when collecting data
[2018-12-06T07:07:09,142][DEBUG][o.e.i.r.RecoveryTarget ] [data-s1] [index-name-2018.12.06][0] reset of recovery with shard [index-name-2018.12.06][0] and id [1497]
[2018-12-06T07:07:09,868][DEBUG][o.e.i.r.PeerRecoveryTargetService] [data-s1] [index-name-2018.12.06][0] recovery done from [{data-s2}{55-3UT8GT92hh-TcE2qbsQ}{VYPCWJTlTlWgOPUCmiANVQ}{x.x.x.x}{x.x.x.x:28004}{ml.machine_memory=23622320128, ml.max_open_jobs=20, xpack.installed=true, box_type=hot, ml.enabled=true}], took [734ms]
[2018-12-06T07:07:11,063][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [master-s1] collector [cluster_stats] timed out when collecting data
[2018-12-06T07:07:30,558][INFO ][o.e.c.r.a.AllocationService] [master-s1] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[index-name-2018.12.06][0]] ...]).
[2018-12-06T07:07:46,697][INFO ][o.e.m.j.JvmGcMonitorService] [master-s1] [gc][575507] overhead, spent [327ms] collecting in the last [1s]
[2018-12-06T07:08:20,729][INFO ][o.e.m.j.JvmGcMonitorService] [master-s1] [gc][575541] overhead, spent [345ms] collecting in the last [1s]
[2018-12-06T07:08:24,393][DEBUG][o.e.i.r.RecoveryTarget ] [data-s1] [index-name-2018.12.06][0] reset of recovery with shard [index-name-2018.12.06][0] and id [1499]
[2018-12-06T07:08:24,456][DEBUG][o.e.i.r.PeerRecoveryTargetService] [data-s1] [index-name-2018.12.06][0] recovery done from [{data-s2}{55-3UT8GT92hh-TcE2qbsQ}{VYPCWJTlTlWgOPUCmiANVQ}{x.x.x.x}{x.x.x.x:28004}{ml.machine_memory=23622320128, ml.max_open_jobs=20, xpack.installed=true, box_type=hot, ml.enabled=true}], took [69ms]
[2018-12-06T07:08:25,150][DEBUG][o.e.i.r.PeerRecoveryTargetService] [data-s1] [index-name-2018.12.06][0] recovery done from [{data-s2}{55-3UT8GT92hh-TcE2qbsQ}{VYPCWJTlTlWgOPUCmiANVQ}{x.x.x.x}{x.x.x.x:28004}{ml.machine_memory=23622320128, ml.max_open_jobs=20, xpack.installed=true, box_type=hot, ml.enabled=true}], took [767ms]
[2018-12-06T07:08:48,654][INFO ][o.e.c.r.a.AllocationService] [master-s1] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[index-name-2018.12.06][0]] ...]).