Failed to save to index due to maximum shard overlimit

aagirre92 · January 18, 2023, 3:57pm

Hello,

I am running a web application (in my own windows server machine) called Automation Anywhere A360. This web application uses a local Elasticsearch instance to handle its Audit Logs.

Cluster health endpoint shows the following:

{
    "cluster_name": "aa_cr_elasticsearch",
    "status": "yellow",
    "timed_out": false,
    "number_of_nodes": 1,
    "number_of_data_nodes": 1,
    "active_primary_shards": 493,
    "active_shards": 493,
    "relocating_shards": 0,
    "initializing_shards": 0,
    "unassigned_shards": 499,
    "delayed_unassigned_shards": 0,
    "number_of_pending_tasks": 0,
    "number_of_in_flight_fetch": 0,
    "task_max_waiting_in_queue_millis": 0,
    "active_shards_percent_as_number": 49.69758064516129
}

The cluster allocation endpoint shows the following:

{
    "index": "bilegacyutility",
    "shard": 2,
    "primary": false,
    "current_state": "unassigned",
    "unassigned_info": {
        "reason": "CLUSTER_RECOVERED",
        "at": "2023-01-11T19:29:57.256Z",
        "last_allocation_status": "no_attempt"
    },
    "can_allocate": "no",
    "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes",
    "node_allocation_decisions": [
        {
            "node_id": "Y2wDy49CSgqkleEfKeQShQ",
            "node_name": "localhost",
            "transport_address": "127.0.0.1:47600",
            "node_decision": "no",
            "deciders": [
                {
                    "decider": "same_shard",
                    "decision": "NO",
                    "explanation": "a copy of this shard is already allocated to this node [[bilegacyutility][2], node[Y2wDy49CSgqkleEfKeQShQ], [P], s[STARTED], a[id=xP-_uMOwSfCxSIVVSIW9vQ]]"
                }
            ]
        }
    ]
}

PROBLEM: Recently some Audit Logs did not pop up in the app and the reason (looking at the logs) is related to sharding:

2023-Jan-09 Mon 15:44:55.539 **ERROR - com.automationanywhere.durablemessaging.DurableMessageTransactionalPublisher - {} - run(DurableMessageTransactionalPublisher.java:460) - Error: com.automationanywhere.es_client.ESRestClientException: Failed to save to index: audit_logs_20230101**
**    at com.automationan**ywhere.es_client.ESRestClient.insertJsonDoc(ESRestClient.java:706) ~[kernel.jar:?]
    at com.automationanywhere.es_client.ESRestClient.insertJsonDoc(ESRestClient.java:765) ~[kernel.jar:?]
    at com.automationanywhere.es_client.ESRestClient.insertJsonDoc(ESRestClient.java:757) ~[kernel.jar:?]
    at com.automationanywhere.audit.model.AuditESPublisher$BatchPublisher.publish(AuditESPublisher.java:36) ~[kernel.jar:?]
    at com.automationanywhere.durablemessaging.DurableMessageTopicPublisher$BatchPublisher.publish(DurableMessageTopicPublisher.java:19) ~[kernel.jar:?]
    at com.automationanywhere.durablemessaging.DurableMessageTransactionalPublisher.lambda$processTopicMessage$1(DurableMessageTransactionalPublisher.java:677) ~[kernel.jar:?]
    at com.automationanywhere.durablemessaging.DurableMessagingBase.lambda$runWithContext$0(DurableMessagingBase.java:64) ~[kernel.jar:?]
    at com.automationanywhere.common.security.context.SecurityContextHelper.runAsUser(SecurityContextHelper.java:253) ~[kernel.jar:?]
    at com.automationanywhere.common.security.context.SecurityContextHelper.runAsUser(SecurityContextHelper.java:238) ~[kernel.jar:?]
    at com.automationanywhere.durablemessaging.DurableMessagingBase.runWithContext(DurableMessagingBase.java:78) ~[kernel.jar:?]
    at com.automationanywhere.durablemessaging.DurableMessageTransactionalPublisher.processTopicMessage(DurableMessageTransactionalPublisher.java:671) ~[kernel.jar:?]
    at com.automationanywhere.durablemessaging.DurableMessageTransactionalPublisher.waitAndProcessMessage(DurableMessageTransactionalPublisher.java:587) ~[kernel.jar:?]
    at com.automationanywhere.durablemessaging.DurableMessageTransactionalPublisher.access$400(DurableMessageTransactionalPublisher.java:116) ~[kernel.jar:?]
    at com.automationanywhere.durablemessaging.DurableMessageTransactionalPublisher$2.run(DurableMessageTransactionalPublisher.java:425) [kernel.jar:?]
**Caused by: org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=validation_exception, reason=Validation Failed: 1: this action would add [10] total shards, but this cluster currently has [992]/[1000] maximum shards open;]**
    at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:187) ~[kernel.jar:?]
    at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1911) ~[kernel.jar:?]
    at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1888) ~[kernel.jar:?]
    at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1645) ~[kernel.jar:?]
    at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1602) ~[kernel.jar:?]
    at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1572) ~[kernel.jar:?]
    at org.elasticsearch.client.RestHighLevelClient.index(RestHighLevelClient.java:989) ~[kernel.jar:?]
    at com.automationanywhere.es_client.ESRestClient.insertJsonDoc(ESRestClient.java:700) ~[kernel.jar:?]

I must point out that our drive where all this is stored has 234GB free (just FYI).

We know that we can increase sharding limit to more than 1000 (we have not done this as it is not recommended at all), but we would like to know a more mid/long term sustainable solution for this, thank you!

warkolm · January 18, 2023, 9:58pm

Given you have a single node you don't need replicas, so I would set everything to 0 replicas and that will help in the short term.

aagirre92 · January 19, 2023, 8:47am

Is that safe to do? (safer than setting cluster's shard limit higher than 1000?)

warkolm · January 19, 2023, 9:24am

You have a single node, you are already at risk of data loss because you have no replicas assigned.

aagirre92 · January 19, 2023, 10:13am

And how can I set everything to 0? Whenever I try to make this request:

PUT /*/_settings

{
    "index": {
        "number_of_replicas": 0
    }
}

The response is the following: (http status 403 Forbidden)

{
    "error": {
        "root_cause": [
            {
                "type": "security_exception",
                "reason": "no permissions for [] and User [name=es_client, backend_roles=[], requestedTenant=null]"
            }
        ],
        "type": "security_exception",
        "reason": "no permissions for [] and User [name=es_client, backend_roles=[], requestedTenant=null]"
    },
    "status": 403
}

How can I set replicas to 0?

Thanks in advance

system · February 16, 2023, 10:14am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch Cluster Status is RED Elasticsearch elastic-stack-monitoring	12	701	June 29, 2021
Indexing stopped - shards unassigned Elasticsearch	7	961	July 19, 2018
Shard Allocation Failed Elasticsearch	4	75	July 29, 2024
An assigned shards -- max retries exceeded Elasticsearch	2	836	March 19, 2021
Shards are in ALLOCATION_FAILED or CLUSTER_RECOVERED Elasticsearch elastic-stack-monitoring	4	1754	August 21, 2023

Failed to save to index due to maximum shard overlimit

Related topics