I recently upgraded from ELK 5.x to 6.1. It has been running fine. We have 12 data nodes -- six at each site and have one replica. We keep the logs for 30 days and have 11 billion documents using 15TB of storage. We put different kinds of logs in different indexes ie F5, firewall, IIS, Linux. This was just to decrease the noise a bit. Depending on the amount of data in each index some are one shard up to 5.
I'm getting errors like this in logstash:
[2018-02-01T10:41:48,087][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://elastic:xxxxxx@localhost:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://elastic:xxxxxx@localhost:9200/, :error_message=>"Elasticsearch Unreachable: [http://elastic:xxxxxx@localhost:9200/][Manticore::SocketTimeout] Read timed out", :error
_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2018-02-01T10:41:48,087][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://elastic:
xxxxxx@localhost:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>64}
I don't see any corresponding error in the elasticsearch log. I was shipping to the data nodes but tried the other day to install ES on the shippers and then send to localhost which is not a data node. That seemed to be working fine but at 5:50 this morning it stopped shipping and the above errors started showing up again. The ES cluster is green. In case it matters Curator runs at 5am.
The data nodes have 48GB RAM with 28GB heap.
From the logstash config there I have this set:
pipeline.workers: 6
pipeline.batch.size: 500
pipeline.batch.delay: 5
It is a 4 vCPU VM and doesn't seem taxed.
In ES I have this:
cluster.routing.allocation.awareness.attributes: site
cluster.routing.allocation.awareness.force.zone.values: site1, site2
discovery.zen.minimum_master_nodes: 1
With one master.
I'm not sure what addional information would be helpful. It is running on RHEL 7 using the RPM install and configured by the elastic Puppet forge module.
Any help would be appreciated.
Thanks,
Peter