Index Recovery failed

Hi there,

i have a big problem with my elasticsearch installation (7.2.0). I'm having one node and since some weeks (i noticed it today) i have the problem that the index recovery failed. Error Message of the log:

failed shard on node [-Mkg3PZgQka0-07_xr6ynQ]: failed recovery, failure RecoveryFailedException[[apclient_log_entry_v1][0]: Recovery failed on {elasticsearch}{-Mkg3PZgQka0-07_xr6ynQ}{mF_Fd09VSq-jtyKTj3ystA}

{172.16.20.11}
{172.16.20.11:9300}

{xpack.installed=true}
]; nested: IndexShardRecoveryException[failed recovery]; nested: IllegalStateException[Checkpoint file translog-169.ckp already exists but has corrupted content expected: Checkpoint{offset=55, numOps=0, generation=169, minSeqNo=-1, maxSeqNo=-1, globalCheckpoint=9875, minTranslogGeneration=168, trimmedAboveSeqNo=-2} but got: Checkpoint{offset=24003, numOps=33, generation=169, minSeqNo=9876, maxSeqNo=9908, globalCheckpoint=9908, minTranslogGeneration=168,

So i did some API calls to check:
http://localhost:9200/_cluster/health?pretty

cluster_name	"elasticsearch"
status	"red"
timed_out	false
number_of_nodes	1
number_of_data_nodes	1
active_primary_shards	20
active_shards	20
relocating_shards	0
initializing_shards	0
unassigned_shards	22
delayed_unassigned_shards	0
number_of_pending_tasks	0
number_of_in_flight_fetch	0
task_max_waiting_in_queue_millis	0
active_shards_percent_as_number	47.61904761904761

http://localhost:9200/_cat/shards?v&h=n,index,shard,prirep,state,sto,sc,unassigned.reason,unassigned.details&s=sto,index

n             index                                    shard prirep state          sto sc unassigned.reason unassigned.details
              admin_notification                       0     r      UNASSIGNED            CLUSTER_RECOVERED 
              apclient_log_entry_v1                    0     p      UNASSIGNED            ALLOCATION_FAILED failed shard on node [-Mkg3PZgQka0-07_xr6ynQ]: failed recovery, failure RecoveryFailedException[[apclient_log_entry_v1][0]: Recovery failed on {elasticsearch}{-Mkg3PZgQka0-07_xr6ynQ}{mF_Fd09VSq-jtyKTj3ystA}{172.16.20.11}{172.16.20.11:9300}{xpack.installed=true}]; nested: IndexShardRecoveryException[failed recovery]; nested: IllegalStateException[Checkpoint file translog-169.ckp already exists but has corrupted content expected: Checkpoint{offset=55, numOps=0, generation=169, minSeqNo=-1, maxSeqNo=-1, globalCheckpoint=9875, minTranslogGeneration=168, trimmedAboveSeqNo=-2} but got: Checkpoint{offset=24003, numOps=33, generation=169, minSeqNo=9876, maxSeqNo=9908, globalCheckpoint=9908, minTranslogGeneration=168, trimmedAboveSeqNo=-2}]; 
              apclient_log_entry_v1                    0     r      UNASSIGNED            CLUSTER_RECOVERED 
              audit_entry_v1                           0     r      UNASSIGNED            CLUSTER_RECOVERED 
              bonita_log_entry_v1                      0     r      UNASSIGNED            CLUSTER_RECOVERED 
              check_definition                         0     r      UNASSIGNED            CLUSTER_RECOVERED 
              configuration_application_profile        0     r      UNASSIGNED            CLUSTER_RECOVERED 
              configuration_assignment_policy          0     r      UNASSIGNED            CLUSTER_RECOVERED 
              configuration_assignment_target          0     r      UNASSIGNED            CLUSTER_RECOVERED 
              configuration_document_content           0     r      UNASSIGNED            CLUSTER_RECOVERED 
              configuration_process_definition         0     r      UNASSIGNED            CLUSTER_RECOVERED 
              entity_organisation_v1                   0     r      UNASSIGNED            CLUSTER_RECOVERED 
              global_configuration_application_profile 0     r      UNASSIGNED            CLUSTER_RECOVERED 
              identity_group_v1                        0     r      UNASSIGNED            CLUSTER_RECOVERED 
              identity_user_v1                         0     r      UNASSIGNED            CLUSTER_RECOVERED 
              kibana_config                            0     r      UNASSIGNED            CLUSTER_RECOVERED 
              kibana_dashboard                         0     r      UNASSIGNED            CLUSTER_RECOVERED 
              kibana_eessi                             0     r      UNASSIGNED            CLUSTER_RECOVERED 
              kibana_index_pattern                     0     r      UNASSIGNED            CLUSTER_RECOVERED 
              kibana_visualization                     0     r      UNASSIGNED            CLUSTER_RECOVERED 
              rest_log_entry_v1                        0     r      UNASSIGNED            CLUSTER_RECOVERED 
              vocabulary_concept                       0     r      UNASSIGNED            CLUSTER_RECOVERED 
elasticsearch audit_entry_v1                           0     p      STARTED       283b  0                   
elasticsearch kibana_config                            0     p      STARTED      3.8kb  1                   
elasticsearch identity_group_v1                        0     p      STARTED      5.2kb  1                   
elasticsearch check_definition                         0     p      STARTED      5.6kb  1                   
elasticsearch kibana_dashboard                         0     p      STARTED      6.1kb  1                   
elasticsearch configuration_assignment_target          0     p      STARTED      6.4kb  1                   
elasticsearch kibana_eessi                             0     p      STARTED      6.9kb  1                   
elasticsearch admin_notification                       0     p      STARTED      9.6kb  1                   
elasticsearch kibana_visualization                     0     p      STARTED      9.6kb  1                   
elasticsearch identity_user_v1                         0     p      STARTED      9.8kb  1                   
elasticsearch configuration_assignment_policy          0     p      STARTED     12.9kb  1                   
elasticsearch global_configuration_application_profile 0     p      STARTED     14.9kb  1                   
elasticsearch kibana_index_pattern                     0     p      STARTED     18.5kb  1                   
elasticsearch vocabulary_concept                       0     p      STARTED     20.7kb  1                   
elasticsearch entity_organisation_v1                   0     p      STARTED     39.2kb  1                   
elasticsearch configuration_process_definition         0     p      STARTED     58.3kb  1                   
elasticsearch configuration_application_profile        0     p      STARTED     60.8kb  1                   
elasticsearch configuration_document_content           0     p      STARTED    549.6kb  5                   
elasticsearch bonita_log_entry_v1                      0     p      STARTED     20.8mb  1                   
elasticsearch rest_log_entry_v1                        0     p      STARTED    934.8mb 23

I read many post on some sites, but all of these did not help. many are telling to set inside the config index.shard.check_on_startup: true - but this is in 7.2.0 not possible anymore. I don't have a backup or snapshot. So how can i fix this error?

I also run lucene to check if there errors, but it says no all data okay:
java -cp lucene-core-8.0.0.jar -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex "C:\Elasticsearch\data\nodes\0\indices\m0hlrhY9TsiyhVYDNkHWdg\0\index"

This is the index (apclient_log_entry_v1) (m0hlrhY9TsiyhVYDNkHWdg) which is UNASSIGNED because ALLOCATION_FAILED.

thanks a lot!

Welcome to our community! :smiley:
7.2 is well past EOL, please upgrade ASAP.

You might be stuck here, because that suggests a filesystem level corruption and you have replicas set. Is the data important?

yeah i see it now, this version has reached EOL. I use an application which only ships with that old version, so that's why i'm using the old one til today :wink:

No the data here is not so important. I can live with data loss here. I don't have snapshots of teh data and my whole backup is older, because i noticed this problem to late. But if i use a newer version of Elasticsearch i will use the built in snaphsot and slm policy feature to create automatic snapshots. From which version is this fully supported?

I solved this error now, by deleting the affected translog.cpk file. After that ES starts again.