How to create Alerts for cluster health (green/yellow/red) and Circuit Breaker errors?

Craig_Rodrigues · August 29, 2023, 12:20am

In Kibana 8.9.0, I managed to successfully create an alert for Cluster Health,
so that if cluster health transitions from green to yellow or red,
I receive an alert.

I did the following:

Go to Stack Monitoring
In the top right, click on Enter setup mode
In the top right, click on Alerts and rules
Go to kbn:/app/management/insightsAndAlerting/triggersActions/rules
Search for Cluster health

This works great.

How can I create alerts if the Circuit breaker errors
which seem to through es_rejected_execution_exception errors?

Even though my cluster health was green, I recently encountered two types of circuit breaker errors that I wish I received alerts for:

Circuit Breaker error 1:

failed to publish events: 429 Too Many Requests: {"error":{"root_cause":[{"type":"es_rejected_execution_exception",
"reason":"rejected execution of coordinating operation 
[coordinating_and_primary_bytes=216741495, replica_bytes=0, all_bytes=216741495, coordinating_operation_bytes=41925,
 max_coordinating_and_primary_bytes=214748364]"}],"type":"es_rejected_execution_exception","reason":"rejected execution of coordinating operation
[coordinating_and_primary_bytes=216741495, replica_bytes=0, all_bytes=216741495, coordinating_operation_bytes=41925,
max_coordinating_and_primary_bytes=214748364]"},"status":429}

Circuit Breaker error 2:

failed to index document (es_rejected_execution_exception): rejected execution of
 TimedRunnable{original=org.elasticsearch.action.support.replication.TransportWriteAction$1/WrappedActionListener{org.elasticsearch.action.support.replication.ReplicationOperation$$Lambda$9217/0x00007f1ce96ca250@2fe4a9c4}
{org.elasticsearch.action.support.replication.ReplicationOperation$$Lambda$9218/0x00007f1ce96ca468@6d6c9beb},
 creationTimeNanos=775669912784792, startTimeNanos=0, finishTimeNanos=-1, failedOrRejected=false} on
 TaskExecutionTimeTrackingEsThreadPoolExecutor[name = quicknode-elastic-es-data-hot-zone-3-1/write,
queue capacity = 10000, task execution EWMA = 3.4ms, total task execution time = 58.2d,
 org.elasticsearch.common.util.concurrent.TaskExecutionTimeTrackingEsThreadPoolExecutor@2e4d252[Running, pool size = 8, active threads = 8, queued tasks = 10001, completed tasks = 639326371]]

Is there a pre-canned Alert that I can use to detect those types of errors?

If not, is there a general technique for creating alerts for Circuit Breaker errors?

jsanz · September 5, 2023, 1:12pm

As of 8.9.1 these are the rules available:

If you get those message logs ingested in your cluster properly, then you need to create custom rule based on an Elasticsearch query as documented here

system · October 3, 2023, 1:13pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Kibana alerts (cluster health yellow) on new index creation Kibana elastic-stack-alerting	5	857	December 7, 2020
Top Cluster Alerts - allways Yellow Elasticsearch	4	1160	August 15, 2017
Kibana Alerts and Actions Kibana elastic-stack-alerting	2	807	April 26, 2021
Automated queries to .kibana_task_manager cause CircuitBreakerException Kibana	9	1549	May 27, 2020
Cluster health wrong yellow spikes (because new index ?) Elasticsearch	8	1151	May 16, 2023

How to create Alerts for cluster health (green/yellow/red) and Circuit Breaker errors?

Related topics