Need some help in understanding Why retrieving all snapshots is taking forever

kushalOtter · June 13, 2024, 9:44pm

Hi all,
We are using the elasticsearch for storing the logs of our system via the filebeat and our cluster consists of the following on 6.8 Elasticsearch Cluster

  "cluster_name" : "elk",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 24,
  "number_of_data_nodes" : 21,
  "active_primary_shards" : 4284,
  "active_shards" : 8568,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

I have setup the snapshot repository to S3. Have a python cronjob that takes the snapshot of indices older than 30 days to s3 and then deletes it on the cluster. Lately, when I go to get the list of snapshot via the command
curl http://localhost:9200/_cat/snapshots/repo_name?pretty
It takes forever to get the result (>10m) and in most of the time, I just give up and my cron-job is failing because it timeouts while its calls to get the status of the snapshot. Any suggestions or help is well appreciated. Thanks

DavidTurner · June 14, 2024, 7:18am

Elasticsearch will reply eventually, but with large snapshot repositories it may indeed take minutes (sometimes hours) to list them all. Especially on such an old version - 6.8 hasn't seen any enhancements for over 5 years now and went EOL ages ago. You need to upgrade as a matter of urgency.

kushalOtter · June 14, 2024, 5:31pm

Thanks a lot David, We have plans to update. I was wondering if there is anything I can do currently to mitigate the latency of retrieving the snapshots. Perhaps only way is to upgrade.

DavidTurner · June 14, 2024, 6:02pm

I don't remember any workarounds but it's been so long since I've even looked at the 6.8 code, sorry.