Executing Alert has resulted in Error

radovan · March 20, 2021, 4:18pm

Hi,

so I tried to setup multiple Kibana instances in our self-managed environment hoping it will improve alerting and detections performance. Together there are 4 Kibana instances, 2 were setup before (as a service with Debian package) on separate nodes and I added another 2 instances on each of the nodes (by installing from zip and creating separate service), so currently it's 2 virtual machines and both have 2 kibana services running. Each service starts fine, 2 instances on the same host run on another ports, have different UUID and server.name. There is also xpack setup with the same configuration on every instance:

elasticsearch.username: "xxx"
elasticsearch.password: "xxxxx"
elasticsearch.ssl.verificationMode: certificate
elasticsearch.ssl.certificateAuthorities: [ "/etc/kibana/certs/ca.crt" ]
xpack.security.enabled: true
xpack.spaces.enabled: true
xpack.security.encryptionKey: xxx
xpack.reporting.encryptionKey: xxx
xpack.encryptedSavedObjects.encryptionKey: xxx
xpack.task_manager.max_workers: 100

All of the instances connect to the designated controller node, which should (I think) load balance the tasks for alerting and security? From time time to time I can see in the logs from every instance, that they index some new signals, so the problem is probably only with alerts and monitoring. And the log that bothers me is this one (I can find it randomly on any of the instances):

{"type":"log","@timestamp":"2021-03-20T16:32:53+01:00","tags":["error","plugins","alerts","plugins","alerting"],"pid":55264,"message":"Executing Alert \"7458fa4b-0d4c-4e6b-a41d-4f534336a83c\" has resulted in Error: Unauthorized to get a \"monitoring_alert_missing_monitoring_data\" alert for \"monitoring\""}

If it understand it correctly, as for default the instances use index .kibana-xx for scheduling tasks, alerts and so, and it seems to me like they are fighting over the tasks? Or what could be the issue here? Because sometimes I also get this log, from the instances that produced previous log:

{"type":"log","@timestamp":"2021-03-20T16:57:43+01:00","tags":["info","plugins","actions","actions"],"pid":75390,"message":"Server log: Disk usage alert is firing for 3 node(s) in cluster: XXX. Verify disk usage levels across affected nodes."}

What else can I provide to troubleshoot this issue?

I also get this log when starting the instances (but not everytime), don't know if it means something I should worry about:

{"type":"log","@timestamp":"2021-03-19T17:28:57+01:00","tags":["warning","plugins","securitySolution"],"pid":12552,"message":"Unable to verify endpoint policies in line with license change: failed to fetch package policies: missing authentication credentials for REST request [/.kibana/_search?size=100&from=0&rest_total_hits_as_int=true]: security_exception"}

Another problem that bothers me (don't know if it's not connected to the previous issue) is when some of the elastic nodes leaves the cluster (restart or upgrade or anything), I get spammed by this log and can't see the instances in Stack Monitoring until I restart the kibana services (NOTE: we still use legacy monitoring).

"type":"log","@timestamp":"2021-03-19T19:50:41+01:00","tags":["warning","plugins","monitoring","monitoring","kibana-monitoring"],"pid":12552,"message":"Error: Cluster client cannot be used after it has been closed.\n    at LegacyClusterClient.assertIsNotClosed (/usr/share/kibana/src/core/server/elasticsearch/legacy/cluster_client.js:195:13)\n    at LegacyClusterClient.callAsInternalUser (/usr/share/kibana/src/core/server/elasticsearch/legacy/cluster_client.js:115:12)\n    at sendBulkPayload (/usr/share/kibana/x-pack/plugins/monitoring/server/kibana_monitoring/lib/send_bulk_payload.js:22:18)\n    at BulkUploader._onPayload (/usr/share/kibana/x-pack/plugins/monitoring/server/kibana_monitoring/bulk_uploader.js:209:43)\n    at BulkUploader._fetchAndUpload (/usr/share/kibana/x-pack/plugins/monitoring/server/kibana_monitoring/bulk_uploader.js:195:20)\n    at runMicrotasks (<anonymous>)\n    at processTicksAndRejections (internal/process/task_queues.js:93:5)"}
{"type":"log","@timestamp":"2021-03-19T19:50:41+01:00","tags":["warning","plugins","monitoring","monitoring","kibana-monitoring"],"pid":12552,"message":"Unable to bulk upload the stats payload to the local cluster"}

rashmi · March 20, 2021, 7:04pm

@gmmorris can u plz shed some light on this ?

gmmorris · March 20, 2021, 11:44pm

Thanks for flagging @rashmi
Looks like this is specifically an issue relating to the Stack Monitoring alerts, which I'm not super familiar with.
I'll ping the team over at Stack Monitoring and hopefully they can help.

radovan · March 21, 2021, 3:12pm

So I found out, that the main issue is caused by space, where I deleted monitoring alerts, but I don't know why it still remained in Task Manager. I tried to remove this space and move important objects to a new space and haven't seen this alert since (it was only an hour ago), but hopefully it won't show up. The only thing was:

{"type":"log","@timestamp":"2021-03-21T15:03:44+01:00","tags":["error","plugins","taskManager"],"pid":55264,"message":"Task alerting:monitoring_alert_missing_monitoring_data \"4686ae80-6723-11eb-bb1e-cd29d5281742\" failed: Error: Saved object [alert/7458fa4b-0d4c-4e6b-a41d-4f534336a83c] not found"}

but that's understandable (and it was triggered just once).

About the other issue, maybe I'll try to setup metricbeat. Is it possible to setup monitoring cluster even with only basic license? Or what do I need to know before I start? Reading through Monitoring in a production environment didn't bring much light into it for me. If it is possible with basic license, would I need at least 2 elastic nodes that act as ingest nodes and also with role data_content and maybe another kibana that would be used just for stack monitoring?

chrisronline · March 21, 2021, 4:17pm

"type":"log","@timestamp":"2021-03-20T16:57:43+01:00","tags":["info","plugins","actions","actions"],"pid":75390,"message":"Server log: Disk usage alert is firing for 3 node(s) in cluster: XXX. Verify disk usage levels across affected nodes."}

About the other issue

Is this the issue you are referring to?

radovan · March 21, 2021, 4:23pm

Nope, second issue is

"type":"log","@timestamp":"2021-03-19T19:50:41+01:00","tags":["warning","plugins","monitoring","monitoring","kibana-monitoring"],"pid":12552,"message":"Error: Cluster client cannot be used after it has been closed.\n    at LegacyClusterClient.assertIsNotClosed (/usr/share/kibana/src/core/server/elasticsearch/legacy/cluster_client.js:195:13)\n    at LegacyClusterClient.callAsInternalUser (/usr/share/kibana/src/core/server/elasticsearch/legacy/cluster_client.js:115:12)\n    at sendBulkPayload (/usr/share/kibana/x-pack/plugins/monitoring/server/kibana_monitoring/lib/send_bulk_payload.js:22:18)\n    at BulkUploader._onPayload (/usr/share/kibana/x-pack/plugins/monitoring/server/kibana_monitoring/bulk_uploader.js:209:43)\n    at BulkUploader._fetchAndUpload (/usr/share/kibana/x-pack/plugins/monitoring/server/kibana_monitoring/bulk_uploader.js:195:20)\n    at runMicrotasks (<anonymous>)\n    at processTicksAndRejections (internal/process/task_queues.js:93:5)"}
{"type":"log","@timestamp":"2021-03-19T19:50:41+01:00","tags":["warning","plugins","monitoring","monitoring","kibana-monitoring"],"pid":12552,"message":"Unable to bulk upload the stats payload to the local cluster"}

But my findings are, that it was probably triggered only when nodes that held monitoring indexes were restarted (because till now we didn't have designated nodes with role "data_content", so it was scattered all around).

chrisronline · March 21, 2021, 4:47pm

FYI, we found a recent issue with the legacy Kibana monitoring not properly restarting when/if the ES node is unavailable: Self monitoring stops uploading if connection to Elasticsearch is lost · Issue #94900 · elastic/kibana · GitHub

radovan · March 21, 2021, 5:12pm

Aah thanks! Now thinking even more about upgrading to metricbeat monitoring, so..

About the other issue, maybe I'll try to setup metricbeat. Is it possible to setup monitoring cluster even with only basic license? Or what do I need to know before I start? Reading through Monitoring in a production environment didn't bring much light into it for me. If it is possible with basic license, would I need at least 2 elastic nodes that act as ingest nodes and also with role data_content and maybe another kibana that would be used just for stack monitoring?

chrisronline · March 22, 2021, 12:15pm

It'll work on the basic license for sure.

You don't necessarily need 2 nodes to get it working - you'll need to ensure the single node functions as the master and ingest node though.

In an ideal world, you want to separate your monitoring data from your "production" data so that you have access to important metrics and logs if your production cluster goes down (which is what that article alludes to) but you don't need this setup to start using Metricbeat to monitor the stack.

radovan · March 22, 2021, 1:32pm

Thanks for this information! We'd rather follow the recommendations and I'll setup monitoring cluster for this purpose.

system · April 19, 2021, 1:33pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
[7.10.2] Pending alerts when multiple instances of kibana Kibana elastic-stack-alerting	11	1019	March 24, 2021
Send email alerts if log level == ERROR in more than one index Kibana elastic-stack-alerting	6	3017	November 7, 2021
Authentication failed for null Kibana elastic-stack-alerting	2	507	October 24, 2022
Error occurred when reading the alert Kibana elastic-stack-alerting	2	460	February 19, 2021
Enable Alerts and Actions Kibana elastic-stack-security , elastic-stack-alerting	17	2555	March 10, 2021

Executing Alert has resulted in Error

Related topics