Hi Team,
I have a cluster with 3 nodes. All nodes are master and data eligible nodes. One of them is configured for kibana in kibana.yml (in [elasticsearch.url] section). I am using filebeat to send logs directly to elasticsearch. When all nodes are up and running, logs received properly and cluster health is green.
I have configured filebeat to send data to multiple elasticsearch hosts.
output.elasticsearch:
Array of hosts to connect to.
hosts: ["172.16.4.11:9200","172.16.4.12:9200","172.16.4.19:9200"]
protocol: "https"
and in kibana.yml - elasticsearch.url: "http://172.16.4.19:9200"
But the problem is when first node 172.16.4.11 is down then filebeat doesn't send logs to other two nodes. When 172.16.4.11 is again up then all logs are received. Why logs are not getting received when first node is down? When 172.16.4.12 is down and other two are up then it is working. I want filebeat to send logs to one of the other two nodes when 172.16.4.11 is not running. Please help me to solve this issue.
I have set discovery.zen.minimum_master_nodes to 2 in elasticsearch.yml. So, when one of them will be down then cluster will work properly.
Elasticsearch - version 6.2.2
Kibana - version 6.2.2
Filebeat - version 6.2.2
OS - Ubuntu 16.04 and CentOS 6.6
In Elasticsearch logs, I got this
[Node1]--> 172.16.4.12
[WARN ][o.e.c.a.s.ShardStateAction] [Node1] [filebeat-6.2.2-2018.03.29][0] received shard failed for shard id [[filebeat-6.2.2-2018.03.29][0]], allocation id [EPFzVuq8TKGswCiryn1KTg], primary term [18], message [mark copy as stale]
[WARN ][o.e.c.a.s.ShardStateAction] [Node1] [filebeat-6.2.2-2018.03.29][2] received shard failed for shard id [[filebeat-6.2.2-2018.03.29][2]], allocation id [EBpijFXORy-YpgxOmXZMJg], primary term [17], message [mark copy as stale]
[WARN ][o.e.c.a.s.ShardStateAction] [Node1] [filebeat-6.2.2-2018.03.29][1] received shard failed for shard id [[filebeat-6.2.2-2018.03.29][1]], allocation id [NV9iYF1BRq6a9Xi3t1vk8g], primary term [16], message [mark copy as stale]
[INFO ][o.e.c.r.a.AllocationService] [Node1] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[slog][2], [slog][3], [slog][0]] ...]).
[INFO ][o.e.c.r.a.AllocationService] [Node1] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[slog][3]] ...]).
There might be more errors in ES about the cluster going RED. Seems the issue is due to some shards not being available due to missing replication or replicated shards being handled by same node.
If the first node is down you get a RED status when querying the cluster health. If so, you have some shards having no primary shard assigned right now.
The Index Shard Stores API might help you getting an overview of your current shards usage.
Hello, the problem is solved.I checked filebeat logs again. It was not getting connected to 4.12 and 4.19. I commented protocol: "https" in filebeat.yml.
The filebeat logs when 172.16.4.11 is not running.
ERROR pipeline/output.go:74 Failed to connect: Get https://172.16.4.11:9200: dial tcp 172.16.4.11:9200: getsockopt: connection refused
ERROR pipeline/output.go:74 Failed to connect: Get https://172.16.4.12:9200: http: server gave HTTP response to HTTPS client
ERROR pipeline/output.go:74 Failed to connect: Get https://172.16.4.19:9200: http: server gave HTTP response to HTTPS client
But now it's working fine. When one of the master is down filebeat sends logs to other two. Thank you for your help!!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.