Hello,
I am running a web application (in my own windows server machine) called Automation Anywhere A360. This web application uses a local Elasticsearch instance to handle its Audit Logs.
Cluster health endpoint shows the following:
{
"cluster_name": "aa_cr_elasticsearch",
"status": "yellow",
"timed_out": false,
"number_of_nodes": 1,
"number_of_data_nodes": 1,
"active_primary_shards": 493,
"active_shards": 493,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 499,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 49.69758064516129
}
The cluster allocation endpoint shows the following:
{
"index": "bilegacyutility",
"shard": 2,
"primary": false,
"current_state": "unassigned",
"unassigned_info": {
"reason": "CLUSTER_RECOVERED",
"at": "2023-01-11T19:29:57.256Z",
"last_allocation_status": "no_attempt"
},
"can_allocate": "no",
"allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes",
"node_allocation_decisions": [
{
"node_id": "Y2wDy49CSgqkleEfKeQShQ",
"node_name": "localhost",
"transport_address": "127.0.0.1:47600",
"node_decision": "no",
"deciders": [
{
"decider": "same_shard",
"decision": "NO",
"explanation": "a copy of this shard is already allocated to this node [[bilegacyutility][2], node[Y2wDy49CSgqkleEfKeQShQ], [P], s[STARTED], a[id=xP-_uMOwSfCxSIVVSIW9vQ]]"
}
]
}
]
}
PROBLEM: Recently some Audit Logs did not pop up in the app and the reason (looking at the logs) is related to sharding:
2023-Jan-09 Mon 15:44:55.539 **ERROR - com.automationanywhere.durablemessaging.DurableMessageTransactionalPublisher - {} - run(DurableMessageTransactionalPublisher.java:460) - Error: com.automationanywhere.es_client.ESRestClientException: Failed to save to index: audit_logs_20230101**
** at com.automationan**ywhere.es_client.ESRestClient.insertJsonDoc(ESRestClient.java:706) ~[kernel.jar:?]
at com.automationanywhere.es_client.ESRestClient.insertJsonDoc(ESRestClient.java:765) ~[kernel.jar:?]
at com.automationanywhere.es_client.ESRestClient.insertJsonDoc(ESRestClient.java:757) ~[kernel.jar:?]
at com.automationanywhere.audit.model.AuditESPublisher$BatchPublisher.publish(AuditESPublisher.java:36) ~[kernel.jar:?]
at com.automationanywhere.durablemessaging.DurableMessageTopicPublisher$BatchPublisher.publish(DurableMessageTopicPublisher.java:19) ~[kernel.jar:?]
at com.automationanywhere.durablemessaging.DurableMessageTransactionalPublisher.lambda$processTopicMessage$1(DurableMessageTransactionalPublisher.java:677) ~[kernel.jar:?]
at com.automationanywhere.durablemessaging.DurableMessagingBase.lambda$runWithContext$0(DurableMessagingBase.java:64) ~[kernel.jar:?]
at com.automationanywhere.common.security.context.SecurityContextHelper.runAsUser(SecurityContextHelper.java:253) ~[kernel.jar:?]
at com.automationanywhere.common.security.context.SecurityContextHelper.runAsUser(SecurityContextHelper.java:238) ~[kernel.jar:?]
at com.automationanywhere.durablemessaging.DurableMessagingBase.runWithContext(DurableMessagingBase.java:78) ~[kernel.jar:?]
at com.automationanywhere.durablemessaging.DurableMessageTransactionalPublisher.processTopicMessage(DurableMessageTransactionalPublisher.java:671) ~[kernel.jar:?]
at com.automationanywhere.durablemessaging.DurableMessageTransactionalPublisher.waitAndProcessMessage(DurableMessageTransactionalPublisher.java:587) ~[kernel.jar:?]
at com.automationanywhere.durablemessaging.DurableMessageTransactionalPublisher.access$400(DurableMessageTransactionalPublisher.java:116) ~[kernel.jar:?]
at com.automationanywhere.durablemessaging.DurableMessageTransactionalPublisher$2.run(DurableMessageTransactionalPublisher.java:425) [kernel.jar:?]
**Caused by: org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=validation_exception, reason=Validation Failed: 1: this action would add [10] total shards, but this cluster currently has [992]/[1000] maximum shards open;]**
at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:187) ~[kernel.jar:?]
at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1911) ~[kernel.jar:?]
at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1888) ~[kernel.jar:?]
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1645) ~[kernel.jar:?]
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1602) ~[kernel.jar:?]
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1572) ~[kernel.jar:?]
at org.elasticsearch.client.RestHighLevelClient.index(RestHighLevelClient.java:989) ~[kernel.jar:?]
at com.automationanywhere.es_client.ESRestClient.insertJsonDoc(ESRestClient.java:700) ~[kernel.jar:?]
I must point out that our drive where all this is stored has 234GB free (just FYI).
We know that we can increase sharding limit to more than 1000 (we have not done this as it is not recommended at all), but we would like to know a more mid/long term sustainable solution for this, thank you!