have you taken a look at the elasticsearch logfiles when this happens? are they empty? Have you taken a look at the elasticsearch node stats if your GC times increase during that time? Does this only affect Elasticsearch on those hosts? Are they running on the same VM?
What does unreachable mean? You got four nodes. Are all of them unreachable? Does unreachable mean you can open a HTTP connection and send your request but dont get a reply?
Side note: Elasticsearch 1.7 has been End-Of-Life since the beginning of this year, is missing a ton of bugfixes and features. I'd try to upgrade.
Thanks for your answer. unfortunately I am a developer and I don't have access to the servers (API nor elasticsearch). (the ops team has but they don't have time for us right now...) I only have access to the Kibana dashboard.
What I mean by "unreachable" is, when a query is made to elasticsearch we have 2 kind of errors;
connection reset by peer
timeout after x time
Migration is planned but it will take time as there is a big gap.
Without further details this is IMO impossible to answer.
Sounds like a GC happening, but they usually dont happen down to the minute. Might be that you have a cronjob that sends a crazy query every two hours that makes your cluster stuck, but this is all just guesswork instead of working with logs and detailed infos and monitoring data...
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.