Failed shard on node [bnK31ibrRG6bSpw_pYK2BA]: shard failure, reason [corrupt file (source: [index id[CTR0CY4BdvDW7Z2cSc8C] origin[PRIMARY] seq#[27209007]])], failure org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed

kishor1 · March 14, 2024, 5:20am

Hi @DavidTurner,
i have gone through the docs you have given me. and their the solution it is saying to increase the data node or increase the size.
After increasing the data node i was getting issue like my data node is running and after sometime it is getting failed , while describing the data node i am getting "readiness provbe failed", i have tried 3 to 4 time every time i am getting same issue , so i have to again undo the changes , but now i have added extra storage(previouslu i have 190 gb) to the data node like this ,

- name: data
    count: 2
    podTemplate:
      spec:
        nodeSelector:
           namespace: "elastic-system"
        tolerations:
             - key: "namespace"
               # operator: "Exists" always commneted when applied for the first time
               operator: "Equal"
               value: "elastic-system"  
               effect: "NoSchedule"
        initContainers:
        - name: sysctl
          securityContext:
            privileged: true
            runAsUser: 0
          command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
        containers:
        - name: elasticsearch
          readinessProbe:
              exec:
                command:
                - bash
                - -c
                - /mnt/elastic-internal/scripts/readiness-probe-script.sh
              failureThreshold: 3
              initialDelaySeconds: 20
              periodSeconds: 12
              successThreshold: 1
              timeoutSeconds: 12
          env:
          - name: READINESS_PROBE_TIMEOUT
            value: "40"
          - name: ES_JAVA_OPTS
            value: -Xms2g -Xmx2g
          resources:
            requests:
              memory: "1Gi"
              cpu: "100m"
            limits:
              memory: "3000Mi"
    config:
      # On Elasticsearch versions before 7.9.0, replace the node.roles configuration with the following:
      # node.master: false
      # node.data: true
      # node.ingest: true
      # node.ml: true
      # node.transform: true
      node.roles: ["data", "ingest","remote_cluster_client"]
      # node.roles: ["data", "ingest", "ml", "transform"]
      # node.remote_cluster_client: true
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 256Gi
        storageClassName: elk-azurefile-sc

and here you can see i have 2 data node having 256gb memory.

Now everyday i am getting around more than 30 gb memory , which is very high i know but still everyday it is deleteing my index.

And this is a secure elasticsearch and no one having the authntication rather than me .

here what i observe like i am running this

GET _cat/indices?v&s=health:desc,index&h=health,status,index,docs.count,pri,rep

health status index                                                        docs.count pri rep
red    open   filebeat-2024.03.13                                                       1   1
green  open   .internal.alerts-observability.apm.alerts-default-000001              0   1   1
green  open   .internal.alerts-observability.logs.alerts-default-000001             0   1   1
green  open   .internal.alerts-observability.metrics.alerts-default-000001          0   1   1
green  open   .internal.alerts-observability.slo.alerts-default-000001              0   1   1
green  open   .internal.alerts-observability.uptime.alerts-default-000001           0   1   1
green  open   .internal.alerts-security.alerts-default-000001                       0   1   1
green  open   .internal.alerts-stack.alerts-default-000001                          0   1   1
green  open   .kibana-observability-ai-assistant-conversations-000001               0   1   1
green  open   .kibana-observability-ai-assistant-kb-000001                          0   1   1
green  open   elastalert                                                          277   1   1
green  open   elastalert_error                                                   3021   1   1
green  open   elastalert_past                                                       0   1   1
green  open   elastalert_silence                                                  277   1   1
green  open   elastalert_status                                                   471   1   1

regarding disk usage

GET /_cat/allocation?v&s=disk.avail&h=node,disk.percent,disk.avail,disk.total,disk.used,disk.indices,shards&pretty

node                             disk.percent disk.avail disk.total disk.used disk.indices shards
UNASSIGNED                                                                                      4
elastic-search-cluster-es-data-0            2      250gb      256gb     5.9gb        6.4mb     33
elastic-search-cluster-es-data-1            2      250gb      256gb     5.9gb          7mb     33
>  

Regarding culster helath

GET _cluster/health

{
"cluster_name": "elastic-search-cluster",
"status": "red",
"timed_out": false,
"number_of_nodes": 3,
"number_of_data_nodes": 2,
"active_primary_shards": 33,
"active_shards": 66,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 4,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 94.28571428571428
}

here what happens

here you see filebeat-2024.03.13 index showing nothing the data is automatically deleted.

Now if you see filebeat-2024.03.13 index just started beacause this index takes lots of memeory so elasticsearch status got red, for that reason i have to delete the index.

so plaease help me to solve this issue.

Topic		Replies	Views
Elasticsearch shard unassigned and changed to red Elasticsearch	16	2150	July 28, 2018
Recovering after shard failure Elasticsearch	7	2391	July 6, 2017
Shard Allocation Failed Elasticsearch	4	451	July 29, 2024
Frequent shard failures Elasticsearch	8	123	December 30, 2024
Elasticsearch issue Elasticsearch	13	2060	July 6, 2017

Failed shard on node [bnK31ibrRG6bSpw_pYK2BA]: shard failure, reason [corrupt file (source: [index id[CTR0CY4BdvDW7Z2cSc8C] origin[PRIMARY] seq#[27209007]])], failure org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed

Related topics