Monitoring cluster bring down production

alepuccetti · March 9, 2018, 2:29pm

I have a separate monitoring cluster separate to the production one. However, when the monitoring cluster does not respond (e.g. maintenance, network problem, internal error) the production cluster goes RED and stop working. The kibana connected to the production get 500 errors.

Is there a way to get have a less disruptive behavior by the production cluster if monitoring is unreachable? Am I doing something wrong?

In each elasticsearch production node, I have this settings for xpack monitoring:

xpack.monitoring.enabled: true
xpack.monitoring.exporters:
  id1:
    type: http
    host: ["elasticsearch_domain"]
    auth.username: "monitoring"
    auth.password: "monitoring_password"

cjcenizal · March 10, 2018, 12:47am

Hi Alessandro, which version of Kibana are you using? I believe this has been fixed in 6.2.

Thanks,
CJ

alepuccetti · March 11, 2018, 7:55pm

Hi CJ,

I run 6.2 for monitoring and 5.6.4 for production. However, I do not think is a kibana problem but an Elasticsearch problem, I did not set up any xpack.monitoring variable on kibana production, I set up xpack.monitoring only on the ES nodes.

cjcenizal · March 12, 2018, 10:05pm

Hi Alessandro, I'm afraid I don't understand what you mean by "6.2 for Monitoring and 5.6.4 for production." Do you mean you're using X-Pack 6.2 with Kibana 5.6.4?

Thanks,
CJ

alepuccetti · March 12, 2018, 11:08pm

Hi CJ,

I mean that the monitoring cluster run on a 6.2 stack (Logstash, Elasticsearch, Kibana, X-Pack) and the production cluster run on a 5.6.4 stack.

cjcenizal · March 13, 2018, 1:56am

Hey Alessandro, thanks for clarifying for me. I spoke with an engineer who works on Monitoring and he thinks that you could try temporarily disabling monitoring collection in your production agent when the monitoring cluster needs some down time. You can fire a dynamic cluster setting to put the collection interval at -1 to do this:

PUT /_cluster/settings
{
    "transient" : {
        "xpack.monitoring.collection.interval" : "-1"
    }
}

The transient setting will be reset if your cluster restarts. If you need it persistent you can just change "transient" to "persistent".

He also was wondering if you could provide some logs from Elasticsearch regarding "production cluster goes RED and stops working"? That will help us figure out if this is a known issue or if it's been fixed.

Thanks,
CJ

alepuccetti · March 13, 2018, 8:53am

Hi CJ,

Thanks for the tip. I do not have the logs to post but I remember that there was a lot of error from the xpack.monitoring module about connection refused, which make sense because the monitoring cluster was down.

The collection interval is a good workaround for schedule downtimes but for outages it will not work.

It seems odd to me that a production environment stops working only because the monitoring one is not reachable. Maybe there is a good reason for that.

Cheers,

system · April 10, 2018, 8:53am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Monitoring cluster Elasticsearch elastic-stack-monitoring	9	1265	October 16, 2019
Cannot see the production cluster data on the monitoring cluster Elasticsearch elastic-stack-monitoring	13	267	April 17, 2024
Is it possible to use a single kibana for monitoring and production server Kibana	3	385	November 7, 2018
Independant monitoring cluster Elasticsearch elastic-stack-monitoring	5	820	March 8, 2019
X-Pack Monitoring. Data Missing Elasticsearch	2	1366	September 22, 2017

Monitoring cluster bring down production

Related topics