Trying to take Indices Backup for full cluster restart

Hi Trying to take backup of Prod indices using

curl -XPUT 'elk.***.com:9200/_snapshot/my_backup' \
		{ 
			"type": "fs", 
			"settings": { 
			"location": "/elkprod_bak/elkback", 
			"compress": true 
				} 
		} 

But getting the error

error": {
      "root_cause": [
         {
            "type": "process_cluster_event_timeout_exception",
            "reason": "failed to process cluster event (put_repository [my_backup]) within 30s"
         }
      ],
      "type": "process_cluster_event_timeout_exception",
      "reason": "failed to process cluster event (put_repository [my_backup]) within 30s"
   },
   "status": 503

The returned message shows, that your request could not be processed within 30 seconds. Create a repository is not a heavy process, so I dont think this particular request is causing issues, but your cluster may be pretty busy.

Can you check the pending tasks and the cluster health an paste it here?

GET _cluster/health

GET _cat/pending_tasks

Cluster health :
{
"cluster_name": "csm_elk_es_whq_1",
"status": "red",
"timed_out": false,
"number_of_nodes": 3,
"number_of_data_nodes": 3,
"active_primary_shards": 2313,
"active_shards": 2469,
"relocating_shards": 0,
"initializing_shards": 8,
"unassigned_shards": 8195,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 236,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 353214,
"active_shards_percent_as_number": 23.135307346326837
}

Also cluster is not stable and pending tasks are keep on increasing ,which result ELK is going down completely

We have clustered three nodes (00,01 and 02 nodes ) .If 02 become then backup will be successfully but its not backup complete data .
"logstash-2015.11.11",
".kibana",
"logstash-2015.11.12",
"logstash-2015.11.10",
"wsiasp-2017.03.30",
"wsiasp-2017.03.31",
"logstash-2015.11.01"
],
"state": "FAILED",
"reason": "Indices don't have primary shards [logstash-2017.07.02, wsiasp-2017.08.08]",
"start_time": "2017-09-01T07:31:33.325Z",
"start_time_in_millis": 1504251093325,
"end_time": "2017-09-01T07:31:42.748Z",
"end_time_in_millis": 1504251102748,
"duration_in_millis": 9423,
"failures": [
{
"index": "logstash-2017.07.02",
"shard_id": 0,
"reason": "primary shard is not allocated",
"status": "INTERNAL_SERVER_ERROR"
},

You seem to have far too many shards for a cluster that size, which most likely is why you are having problems with performance and stability. You need to either scale the cluster up/out or dramatically reduce the number of shards, e.g. by deleting data or reindexing into consolidated indices with fewer shards.

Thanks for the response. I have 3 nodes ES cluster which has around 1 TB of data and just 2 indices.
So you are saying to either add 1 more node to cluster or have multiple indices so that there will be fewer shards per indices. Correct me if am wrong.
Addition to this i have 4.4 Billion documents and 1100 indices with 2.1 TB data all the nodes put together.Capture

You have over 10000 shards, so if you are using the default 5 primary shards and 1 replica, that is over 1000 indices. If you have 2TB of data in the cluster, your average shard size is around 200MB, which is quite small. If you have a long retention period (looks like that based on the number of indices) I would suggest switching to monthly indices with 3-5 primary shards. That would give an average shard size roughly between 5GB and 10GB, which is quite common size.

So how to go about taking snapshots of the indices?
Do I need to reindex with new shards settings?

You may need to get the cluster into a better shape by reducing the number of shards before you take a snapshot. For ways to reduce the number of shards, have a look at the reindex API and the shrink index API.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.