Data Indices are recovery

I would like to check with community on recovery state of indices... after upgrading the ELK stack to 8.4.1, i could see that data indices are recovering very often.. Is there any reason for this behavior.

GET _cat/recovery?v&h=t,ty,i,f,tnode,start,to&s=start:desc

t ty i f tnode start to
745ms peer data_catalogue_dashboard 0 quickstart-es-data-nodes-1 2022-10-18T12:20:47.044Z 0
841ms peer catalogue_for_field_level 0 quickstart-es-data-nodes-1 2022-10-18T12:20:46.914Z 0
1s peer ecommerce_source 0 quickstart-es-data-nodes-1 2022-10-18T12:20:44.445Z 0
2.1s peer audit_logs_itg_data 0 quickstart-es-data-nodes-1 2022-10-18T12:20:42.444Z 0
1.9s peer calculations-itg-todate-01 0 quickstart-es-data-nodes-1 2022-10-18T12:20:38.654Z 0
1.7s peer api_performance_itgdiagnostic_logs 0 quickstart-es-data-nodes-1 2022-10-18T12:20:35.305Z 0
7.8s peer ccs_count_data_index 0 quickstart-es-data-nodes-1 2022-10-18T12:20:35.202Z 0
1s peer ccs-hive-count-index 0 quickstart-es-data-nodes-1 2022-10-18T12:20:32.326Z 0
774ms peer cr_service_data_src_redis_check 0 quickstart-es-data-nodes-1 2022-10-18T12:20:24.072Z 0
1s peer measurements-itg-aug 0 quickstart-es-data-nodes-1 2022-10-18T12:20:23.929Z 0
1.4s peer cr_service_completeness_result_redis_check 0 quickstart-es-data-nodes-1 2022-10-18T12:20:20.440Z 0
448ms peer data_catalogue_dashboard 0 quickstart-es-data-nodes-2 2022-10-18T11:36:36.516Z 0
723ms peer catalogue_for_field_level 0 quickstart-es-data-nodes-2 2022-10-18T11:36:33.195Z 0
610ms peer ecommerce_source 0 quickstart-es-data-nodes-2 2022-10-18T11:36:33.098Z 0
1s peer audit_logs_itg_data 0 quickstart-es-data-nodes-2 2022-10-18T11:36:29.828Z 0
640ms peer cr_service_data_src 0 quickstart-es-data-nodes-2 2022-10-18T11:36:29.743Z 0
1.2s peer api_performance_itgdiagnostic_logs 0 quickstart-es-data-nodes-2 2022-10-18T11:36:26.484Z 0
721ms peer ccsdataingestionapi-itg-2022.05.27 0 quickstart-es-data-nodes-2 2022-10-18T11:36:26.363Z 0
394ms peer metrics-endpoint.metadata_current_default 0 quickstart-es-data-nodes-2 2022-10-18T11:36:18.946Z 0
578ms peer measurements-itg-aug 0 quickstart-es-data-nodes-2 2022-10-18T11:36:15.884Z 0
472ms peer cr_service_completeness_result_redis_check 0 quickstart-es-data-nodes-2 2022-10-18T11:36:13.554Z 0
1s peer catalogue_for_field_level 0 quickstart-es-data-nodes-0 2022-10-18T11:21:08.918Z 0
454ms peer data_catalogue_dashboard 0 quickstart-es-data-nodes-0 2022-10-18T11:21:05.625Z 0
491ms peer ccsdataingestionapi-itg-2022.05.27 0 quickstart-es-data-nodes-0 2022-10-18T11:21:02.460Z 0
688ms peer cr_service_data_src 0 quickstart-es-data-nodes-0 2022-10-18T11:21:02.374Z 0
802ms peer calculations-itg-todate-01 0 quickstart-es-data-nodes-0 2022-10-18T11:20:59.193Z 0
458ms peer cr_service_data_src_redis_check 0 quickstart-es-data-nodes-0 2022-10-18T11:20:53.921Z 0
392ms peer metrics-endpoint.metadata_current_default 0 quickstart-es-data-nodes-0 2022-10-18T11:20:53.831Z 0
569ms existing_store ccs_count_data_index 0 quickstart-es-data-nodes-0 2022-10-18T11:20:39.218Z 0
516ms existing_store ccs-hive-count-index 0 quickstart-es-data-nodes-0 2022-10-18T11:20:39.120Z 0

One key important information is, this cluster build out K8S environment and did not even see a single restart on each container level.. but still recoveries are happening each data node level..

quickstart-es-data-nodes-0 2/2 Running 0 4d3h
quickstart-es-data-nodes-1 2/2 Running 0 4d3h
quickstart-es-data-nodes-2 2/2 Running 0 4d3h
quickstart-es-master-nodes-0 2/2 Running 0 4d3h
quickstart-es-master-nodes-1 2/2 Running 0 4d3h
quickstart-es-master-nodes-2 2/2 Running 0 4d3h

Hi @Ravi_S1 . From the logs it looks like this is something to do with your Elasticsearch nodes.

Is there a reason you think this is related to Kibana?

@Andrew_Tate, I feel not.. i am trying to apply some mitigation steps and will post the results.

@ Andrew_Tate
Even i applied the following recovery steps.. but still i can find recoveries happen on data indices..
PUT _cluster/settings{"transient":{"cluster.routing.allocation.node_concurrent_recoveries":3}}

PUT _all/_settings{"settings":{"index.unassigned.node_left.delayed_timeout":"6m"}}

PUT _cluster/settings{"transient":{"indices.recovery.max_bytes_per_sec":"100mb"}}

And also all POD's are stable .. as per below

quickstart-es-data-nodes-0 2/2 Running 0 17d
quickstart-es-data-nodes-1 2/2 Running 0 17d
quickstart-es-data-nodes-2 2/2 Running 0 17d
quickstart-es-master-nodes-0 2/2 Running 0 17d
quickstart-es-master-nodes-1 2/2 Running 0 17d
quickstart-es-master-nodes-2 2/2 Running 0 17d

Why indices are participating on recovery stage as below..

t ty i
1.7s peer ecommerce_data_src_redis_check
1.5s peer calculations-itg-todate-01
1.3s peer myacct_orders_service_data_src_redis_check
1.3s peer sc_service_data_src_redis_check
1.3s peer sr_service_data_src_redis_check
1.2s existing_store ccs_count_data_index
1.2s peer cr_service_data_src_redis_check
1.2s peer api_performance_itgdiagnostic_logs

Do we need any specific configuration on recovery side that should stop recovery on data indices by default.

Any data that required to be interested for us on this issue.. i am ready to provide.. any inputs will help us alot.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.