Executing Alert has resulted in Error

Hi,

so I tried to setup multiple Kibana instances in our self-managed environment hoping it will improve alerting and detections performance. Together there are 4 Kibana instances, 2 were setup before (as a service with Debian package) on separate nodes and I added another 2 instances on each of the nodes (by installing from zip and creating separate service), so currently it's 2 virtual machines and both have 2 kibana services running. Each service starts fine, 2 instances on the same host run on another ports, have different UUID and server.name. There is also xpack setup with the same configuration on every instance:

elasticsearch.username: "xxx"
elasticsearch.password: "xxxxx"
elasticsearch.ssl.verificationMode: certificate
elasticsearch.ssl.certificateAuthorities: [ "/etc/kibana/certs/ca.crt" ]
xpack.security.enabled: true
xpack.spaces.enabled: true
xpack.security.encryptionKey: xxx
xpack.reporting.encryptionKey: xxx
xpack.encryptedSavedObjects.encryptionKey: xxx
xpack.task_manager.max_workers: 100

All of the instances connect to the designated controller node, which should (I think) load balance the tasks for alerting and security? From time time to time I can see in the logs from every instance, that they index some new signals, so the problem is probably only with alerts and monitoring. And the log that bothers me is this one (I can find it randomly on any of the instances):

{"type":"log","@timestamp":"2021-03-20T16:32:53+01:00","tags":["error","plugins","alerts","plugins","alerting"],"pid":55264,"message":"Executing Alert \"7458fa4b-0d4c-4e6b-a41d-4f534336a83c\" has resulted in Error: Unauthorized to get a \"monitoring_alert_missing_monitoring_data\" alert for \"monitoring\""}

If it understand it correctly, as for default the instances use index .kibana-xx for scheduling tasks, alerts and so, and it seems to me like they are fighting over the tasks? Or what could be the issue here? Because sometimes I also get this log, from the instances that produced previous log:

{"type":"log","@timestamp":"2021-03-20T16:57:43+01:00","tags":["info","plugins","actions","actions"],"pid":75390,"message":"Server log: Disk usage alert is firing for 3 node(s) in cluster: XXX. Verify disk usage levels across affected nodes."}

What else can I provide to troubleshoot this issue?

I also get this log when starting the instances (but not everytime), don't know if it means something I should worry about:

{"type":"log","@timestamp":"2021-03-19T17:28:57+01:00","tags":["warning","plugins","securitySolution"],"pid":12552,"message":"Unable to verify endpoint policies in line with license change: failed to fetch package policies: missing authentication credentials for REST request [/.kibana/_search?size=100&from=0&rest_total_hits_as_int=true]: security_exception"}

Another problem that bothers me (don't know if it's not connected to the previous issue) is when some of the elastic nodes leaves the cluster (restart or upgrade or anything), I get spammed by this log and can't see the instances in Stack Monitoring until I restart the kibana services (NOTE: we still use legacy monitoring).

"type":"log","@timestamp":"2021-03-19T19:50:41+01:00","tags":["warning","plugins","monitoring","monitoring","kibana-monitoring"],"pid":12552,"message":"Error: Cluster client cannot be used after it has been closed.\n    at LegacyClusterClient.assertIsNotClosed (/usr/share/kibana/src/core/server/elasticsearch/legacy/cluster_client.js:195:13)\n    at LegacyClusterClient.callAsInternalUser (/usr/share/kibana/src/core/server/elasticsearch/legacy/cluster_client.js:115:12)\n    at sendBulkPayload (/usr/share/kibana/x-pack/plugins/monitoring/server/kibana_monitoring/lib/send_bulk_payload.js:22:18)\n    at BulkUploader._onPayload (/usr/share/kibana/x-pack/plugins/monitoring/server/kibana_monitoring/bulk_uploader.js:209:43)\n    at BulkUploader._fetchAndUpload (/usr/share/kibana/x-pack/plugins/monitoring/server/kibana_monitoring/bulk_uploader.js:195:20)\n    at runMicrotasks (<anonymous>)\n    at processTicksAndRejections (internal/process/task_queues.js:93:5)"}
{"type":"log","@timestamp":"2021-03-19T19:50:41+01:00","tags":["warning","plugins","monitoring","monitoring","kibana-monitoring"],"pid":12552,"message":"Unable to bulk upload the stats payload to the local cluster"}

@gmmorris can u plz shed some light on this ?

Thanks for flagging @rashmi
Looks like this is specifically an issue relating to the Stack Monitoring alerts, which I'm not super familiar with.
I'll ping the team over at Stack Monitoring and hopefully they can help.

So I found out, that the main issue is caused by space, where I deleted monitoring alerts, but I don't know why it still remained in Task Manager. I tried to remove this space and move important objects to a new space and haven't seen this alert since (it was only an hour ago), but hopefully it won't show up. The only thing was:

{"type":"log","@timestamp":"2021-03-21T15:03:44+01:00","tags":["error","plugins","taskManager"],"pid":55264,"message":"Task alerting:monitoring_alert_missing_monitoring_data \"4686ae80-6723-11eb-bb1e-cd29d5281742\" failed: Error: Saved object [alert/7458fa4b-0d4c-4e6b-a41d-4f534336a83c] not found"}

but that's understandable (and it was triggered just once).

About the other issue, maybe I'll try to setup metricbeat. Is it possible to setup monitoring cluster even with only basic license? Or what do I need to know before I start? Reading through Monitoring in a production environment didn't bring much light into it for me. If it is possible with basic license, would I need at least 2 elastic nodes that act as ingest nodes and also with role data_content and maybe another kibana that would be used just for stack monitoring?

"type":"log","@timestamp":"2021-03-20T16:57:43+01:00","tags":["info","plugins","actions","actions"],"pid":75390,"message":"Server log: Disk usage alert is firing for 3 node(s) in cluster: XXX. Verify disk usage levels across affected nodes."}

About the other issue

Is this the issue you are referring to?

Nope, second issue is

"type":"log","@timestamp":"2021-03-19T19:50:41+01:00","tags":["warning","plugins","monitoring","monitoring","kibana-monitoring"],"pid":12552,"message":"Error: Cluster client cannot be used after it has been closed.\n    at LegacyClusterClient.assertIsNotClosed (/usr/share/kibana/src/core/server/elasticsearch/legacy/cluster_client.js:195:13)\n    at LegacyClusterClient.callAsInternalUser (/usr/share/kibana/src/core/server/elasticsearch/legacy/cluster_client.js:115:12)\n    at sendBulkPayload (/usr/share/kibana/x-pack/plugins/monitoring/server/kibana_monitoring/lib/send_bulk_payload.js:22:18)\n    at BulkUploader._onPayload (/usr/share/kibana/x-pack/plugins/monitoring/server/kibana_monitoring/bulk_uploader.js:209:43)\n    at BulkUploader._fetchAndUpload (/usr/share/kibana/x-pack/plugins/monitoring/server/kibana_monitoring/bulk_uploader.js:195:20)\n    at runMicrotasks (<anonymous>)\n    at processTicksAndRejections (internal/process/task_queues.js:93:5)"}
{"type":"log","@timestamp":"2021-03-19T19:50:41+01:00","tags":["warning","plugins","monitoring","monitoring","kibana-monitoring"],"pid":12552,"message":"Unable to bulk upload the stats payload to the local cluster"}

But my findings are, that it was probably triggered only when nodes that held monitoring indexes were restarted (because till now we didn't have designated nodes with role "data_content", so it was scattered all around).

FYI, we found a recent issue with the legacy Kibana monitoring not properly restarting when/if the ES node is unavailable: Self monitoring stops uploading if connection to Elasticsearch is lost · Issue #94900 · elastic/kibana · GitHub

1 Like

Aah thanks! Now thinking even more about upgrading to metricbeat monitoring, so..

About the other issue, maybe I'll try to setup metricbeat. Is it possible to setup monitoring cluster even with only basic license? Or what do I need to know before I start? Reading through Monitoring in a production environment didn't bring much light into it for me. If it is possible with basic license, would I need at least 2 elastic nodes that act as ingest nodes and also with role data_content and maybe another kibana that would be used just for stack monitoring?

It'll work on the basic license for sure.

You don't necessarily need 2 nodes to get it working - you'll need to ensure the single node functions as the master and ingest node though.

In an ideal world, you want to separate your monitoring data from your "production" data so that you have access to important metrics and logs if your production cluster goes down (which is what that article alludes to) but you don't need this setup to start using Metricbeat to monitor the stack.

1 Like

Thanks for this information! We'd rather follow the recommendations and I'll setup monitoring cluster for this purpose.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.