In Kibana 8.9.0, I managed to successfully create an alert for Cluster Health,
so that if cluster health transitions from green to yellow or red,
I receive an alert.
I did the following:
- Go to Stack Monitoring
- In the top right, click on
Enter setup mode
- In the top right, click on
Alerts and rules
- Go to kbn:/app/management/insightsAndAlerting/triggersActions/rules
- Search for Cluster health
This works great.
How can I create alerts if the Circuit breaker errors
which seem to through es_rejected_execution_exception
errors?
Even though my cluster health was green, I recently encountered two types of circuit breaker errors that I wish I received alerts for:
- Circuit Breaker error 1:
failed to publish events: 429 Too Many Requests: {"error":{"root_cause":[{"type":"es_rejected_execution_exception",
"reason":"rejected execution of coordinating operation
[coordinating_and_primary_bytes=216741495, replica_bytes=0, all_bytes=216741495, coordinating_operation_bytes=41925,
max_coordinating_and_primary_bytes=214748364]"}],"type":"es_rejected_execution_exception","reason":"rejected execution of coordinating operation
[coordinating_and_primary_bytes=216741495, replica_bytes=0, all_bytes=216741495, coordinating_operation_bytes=41925,
max_coordinating_and_primary_bytes=214748364]"},"status":429}
- Circuit Breaker error 2:
failed to index document (es_rejected_execution_exception): rejected execution of
TimedRunnable{original=org.elasticsearch.action.support.replication.TransportWriteAction$1/WrappedActionListener{org.elasticsearch.action.support.replication.ReplicationOperation$$Lambda$9217/0x00007f1ce96ca250@2fe4a9c4}
{org.elasticsearch.action.support.replication.ReplicationOperation$$Lambda$9218/0x00007f1ce96ca468@6d6c9beb},
creationTimeNanos=775669912784792, startTimeNanos=0, finishTimeNanos=-1, failedOrRejected=false} on
TaskExecutionTimeTrackingEsThreadPoolExecutor[name = quicknode-elastic-es-data-hot-zone-3-1/write,
queue capacity = 10000, task execution EWMA = 3.4ms, total task execution time = 58.2d,
org.elasticsearch.common.util.concurrent.TaskExecutionTimeTrackingEsThreadPoolExecutor@2e4d252[Running, pool size = 8, active threads = 8, queued tasks = 10001, completed tasks = 639326371]]
Is there a pre-canned Alert that I can use to detect those types of errors?
If not, is there a general technique for creating alerts for Circuit Breaker errors?