Index Recovery failed

Hi there,

i have a big problem with my elasticsearch installation (7.2.0). I'm having one node and since some weeks (i noticed it today) i have the problem that the index recovery failed. Error Message of the log:

failed shard on node [-Mkg3PZgQka0-07_xr6ynQ]: failed recovery, failure RecoveryFailedException[[apclient_log_entry_v1][0]: Recovery failed on {elasticsearch}{-Mkg3PZgQka0-07_xr6ynQ}{mF_Fd09VSq-jtyKTj3ystA}

{172.16.20.11}
{172.16.20.11:9300}

{xpack.installed=true}
]; nested: IndexShardRecoveryException[failed recovery]; nested: IllegalStateException[Checkpoint file translog-169.ckp already exists but has corrupted content expected: Checkpoint{offset=55, numOps=0, generation=169, minSeqNo=-1, maxSeqNo=-1, globalCheckpoint=9875, minTranslogGeneration=168, trimmedAboveSeqNo=-2} but got: Checkpoint{offset=24003, numOps=33, generation=169, minSeqNo=9876, maxSeqNo=9908, globalCheckpoint=9908, minTranslogGeneration=168,

So i did some API calls to check:
http://localhost:9200/_cluster/health?pretty

cluster_name	"elasticsearch"
status	"red"
timed_out	false
number_of_nodes	1
number_of_data_nodes	1
active_primary_shards	20
active_shards	20
relocating_shards	0
initializing_shards	0
unassigned_shards	22
delayed_unassigned_shards	0
number_of_pending_tasks	0
number_of_in_flight_fetch	0
task_max_waiting_in_queue_millis	0
active_shards_percent_as_number	47.61904761904761

http://localhost:9200/_cat/shards?v&h=n,index,shard,prirep,state,sto,sc,unassigned.reason,unassigned.details&s=sto,index

n             index                                    shard prirep state          sto sc unassigned.reason unassigned.details
              admin_notification                       0     r      UNASSIGNED            CLUSTER_RECOVERED 
              apclient_log_entry_v1                    0     p      UNASSIGNED            ALLOCATION_FAILED failed shard on node [-Mkg3PZgQka0-07_xr6ynQ]: failed recovery, failure RecoveryFailedException[[apclient_log_entry_v1][0]: Recovery failed on {elasticsearch}{-Mkg3PZgQka0-07_xr6ynQ}{mF_Fd09VSq-jtyKTj3ystA}{172.16.20.11}{172.16.20.11:9300}{xpack.installed=true}]; nested: IndexShardRecoveryException[failed recovery]; nested: IllegalStateException[Checkpoint file translog-169.ckp already exists but has corrupted content expected: Checkpoint{offset=55, numOps=0, generation=169, minSeqNo=-1, maxSeqNo=-1, globalCheckpoint=9875, minTranslogGeneration=168, trimmedAboveSeqNo=-2} but got: Checkpoint{offset=24003, numOps=33, generation=169, minSeqNo=9876, maxSeqNo=9908, globalCheckpoint=9908, minTranslogGeneration=168, trimmedAboveSeqNo=-2}]; 
              apclient_log_entry_v1                    0     r      UNASSIGNED            CLUSTER_RECOVERED 
              audit_entry_v1                           0     r      UNASSIGNED            CLUSTER_RECOVERED 
              bonita_log_entry_v1                      0     r      UNASSIGNED            CLUSTER_RECOVERED 
              check_definition                         0     r      UNASSIGNED            CLUSTER_RECOVERED 
              configuration_application_profile        0     r      UNASSIGNED            CLUSTER_RECOVERED 
              configuration_assignment_policy          0     r      UNASSIGNED            CLUSTER_RECOVERED 
              configuration_assignment_target          0     r      UNASSIGNED            CLUSTER_RECOVERED 
              configuration_document_content           0     r      UNASSIGNED            CLUSTER_RECOVERED 
              configuration_process_definition         0     r      UNASSIGNED            CLUSTER_RECOVERED 
              entity_organisation_v1                   0     r      UNASSIGNED            CLUSTER_RECOVERED 
              global_configuration_application_profile 0     r      UNASSIGNED            CLUSTER_RECOVERED 
              identity_group_v1                        0     r      UNASSIGNED            CLUSTER_RECOVERED 
              identity_user_v1                         0     r      UNASSIGNED            CLUSTER_RECOVERED 
              kibana_config                            0     r      UNASSIGNED            CLUSTER_RECOVERED 
              kibana_dashboard                         0     r      UNASSIGNED            CLUSTER_RECOVERED 
              kibana_eessi                             0     r      UNASSIGNED            CLUSTER_RECOVERED 
              kibana_index_pattern                     0     r      UNASSIGNED            CLUSTER_RECOVERED 
              kibana_visualization                     0     r      UNASSIGNED            CLUSTER_RECOVERED 
              rest_log_entry_v1                        0     r      UNASSIGNED            CLUSTER_RECOVERED 
              vocabulary_concept                       0     r      UNASSIGNED            CLUSTER_RECOVERED 
elasticsearch audit_entry_v1                           0     p      STARTED       283b  0                   
elasticsearch kibana_config                            0     p      STARTED      3.8kb  1                   
elasticsearch identity_group_v1                        0     p      STARTED      5.2kb  1                   
elasticsearch check_definition                         0     p      STARTED      5.6kb  1                   
elasticsearch kibana_dashboard                         0     p      STARTED      6.1kb  1                   
elasticsearch configuration_assignment_target          0     p      STARTED      6.4kb  1                   
elasticsearch kibana_eessi                             0     p      STARTED      6.9kb  1                   
elasticsearch admin_notification                       0     p      STARTED      9.6kb  1                   
elasticsearch kibana_visualization                     0     p      STARTED      9.6kb  1                   
elasticsearch identity_user_v1                         0     p      STARTED      9.8kb  1                   
elasticsearch configuration_assignment_policy          0     p      STARTED     12.9kb  1                   
elasticsearch global_configuration_application_profile 0     p      STARTED     14.9kb  1                   
elasticsearch kibana_index_pattern                     0     p      STARTED     18.5kb  1                   
elasticsearch vocabulary_concept                       0     p      STARTED     20.7kb  1                   
elasticsearch entity_organisation_v1                   0     p      STARTED     39.2kb  1                   
elasticsearch configuration_process_definition         0     p      STARTED     58.3kb  1                   
elasticsearch configuration_application_profile        0     p      STARTED     60.8kb  1                   
elasticsearch configuration_document_content           0     p      STARTED    549.6kb  5                   
elasticsearch bonita_log_entry_v1                      0     p      STARTED     20.8mb  1                   
elasticsearch rest_log_entry_v1                        0     p      STARTED    934.8mb 23

I read many post on some sites, but all of these did not help. many are telling to set inside the config index.shard.check_on_startup: true - but this is in 7.2.0 not possible anymore. I don't have a backup or snapshot. So how can i fix this error?

I also run lucene to check if there errors, but it says no all data okay:
java -cp lucene-core-8.0.0.jar -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex "C:\Elasticsearch\data\nodes\0\indices\m0hlrhY9TsiyhVYDNkHWdg\0\index"

This is the index (apclient_log_entry_v1) (m0hlrhY9TsiyhVYDNkHWdg) which is UNASSIGNED because ALLOCATION_FAILED.

thanks a lot!

Welcome to our community! :smiley:
7.2 is well past EOL, please upgrade ASAP.

You might be stuck here, because that suggests a filesystem level corruption and you have replicas set. Is the data important?

yeah i see it now, this version has reached EOL. I use an application which only ships with that old version, so that's why i'm using the old one til today :wink:

No the data here is not so important. I can live with data loss here. I don't have snapshots of teh data and my whole backup is older, because i noticed this problem to late. But if i use a newer version of Elasticsearch i will use the built in snaphsot and slm policy feature to create automatic snapshots. From which version is this fully supported?

I solved this error now, by deleting the affected translog.cpk file. After that ES starts again.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.