Snapshot restore is very slow to get started

I have been observing really slow start up times for restoring snapshots in elasticsearch. I am using graphana as a tool to monitor the cluster. As you can see the snapshot was started around 17:15 and only saw significant network usage about 3 hours later.

Is this normal? how does the restore process work and why does it not start immediately ?

It is worth mentioning that this snapshot did not successfully finish (reasons unknown), but similar behaviour have been observed with previous snapshots.

image

image

I am restoring one index with 13.2G documents with 470 shards, about 25T of data, restoring 1 replica and adding 1 replica in the restore process. I'm looking at accelerating as there is a huge 3 hour lag before it actually starts and it only goes up to 20Mb/s which is slow considering all these services are being hosted in GCP so we should be able to getting higher internal bandwidth speeds. In an attempt to speed up the process I have made the following changes:

{
  "persistent": {
    "indices.recovery.max_bytes_per_sec": "250mb",
    "indices.recovery.max_concurrent_file_chunks": 5,
    "indices.recovery.concurrent_streams": 10,
    "cluster.routing.allocation.node_concurrent_recoveries": 10,
		"cluster.routing.allocation.node_initial_primaries_recoveries": 20

  }
}

Any recommendations or rules of thumb to these settings would be appreciated.

Thank you in advance.

EDIT:

There doesn't seem to be any pending tasks when I get _cluster/pending_tasks

{
    "tasks": []
}

and cluster health :

{
"cluster_name": "elasticsearch",
"status": "red",
"timed_out": false,
"number_of_nodes": 7,
"number_of_data_nodes": 7,
"active_primary_shards": 5,
"active_shards": 10,
"relocating_shards": 0,
"initializing_shards": 140,
"unassigned_shards": 1270,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 0.7042253521126761
}

the network bandwith continues at only a few Kb/s:
image

Operation/s:

CPU usage:
image

Document Count and Indexing Rate:

It's seems to be writing docs but at really slow pace. I suspect this will speed up soon, but this stage is very slow.

Doesn't sound normal, no. What version are you using?

Elasticsearch should start to copy data from the repository pretty much as soon as the new shards are assigned to nodes. Did you have unassigned shards during the 3 hour waiting period?

I am using version 6.8.3.

Yes it's quite possible, I also have 170 unassigned shards in the current running recovery. I thought these were being created by the restore process before feeding in the data. Is there a way to mitigate this ? would you be able to give a more in depth explanation on how the restore process works ?

When you restore a snapshot Elasticsearch creates all the shards up-front and then they are assigned to nodes just like any other shard; when the primary is assigned it copies the data from the repository to initialise itself, and then any replicas are initialised by making a copy of the primary just like for any other shard.

Assigning shards to nodes is done by the master; if the master is otherwise busy then it may take some time to do that, and if it's dealing with lots of other unassigned shards then it may not assign the restore's primaries straight away either.

Thanks for the explanation and the quick response !

According to your explanation, my restore snapshot, creates the shards up front but takes 3 hours to do so, is there any explanation for this behaviour? How would I be able to mitigate this in my next restore?

My guess is that it creates the shards quickly but takes 3 hours to assign them for some reason. I would be surprised if creating the shards took 3 hours. That's up to you to confirm -- you'll need to try another restore to gather more information. I suggest watching the cluster pending tasks API and the cluster health API, maybe GET _nodes/_master/hot_threads?threads=99999 to see low-level details of what's occupying the master, and using the allocation explain API to get more information on why shards aren't being allocated if that does turn out to be the problem.

Until we know what exactly is not happening as quickly as expected (and why) I can't offer any advice on how to mitigate it.

the threads don't seem to being pushed at all:

{elasticsearch-master-0.elasticsearch-master-headless.elasticsearch.svc.cluster.local}{10.32.67.3:9300}{ml.machine_memory=54874828800, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true, nodepool_id=diuvjg-elastic-local-storage}
   Hot threads at 2020-03-31T12:57:45.200Z, interval=500ms, busiestThreads=99999, ignoreIdleThreads=true:

 7.5% (37.3ms out of 500ms) cpu usage by thread 'elasticsearch[W7LJGMn][generic][T#18]'
     2/10 snapshots sharing following 31 elements
       java.base@11.0.6/sun.nio.ch.FileDispatcherImpl.write0(Native Method)
       java.base@11.0.6/sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:62)
       java.base@11.0.6/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113)
       java.base@11.0.6/sun.nio.ch.IOUtil.write(IOUtil.java:79)
       java.base@11.0.6/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:280)
       java.base@11.0.6/java.nio.channels.Channels.writeFullyImpl(Channels.java:74)
       java.base@11.0.6/java.nio.channels.Channels.writeFully(Channels.java:97)
       java.base@11.0.6/java.nio.channels.Channels$1.write(Channels.java:172)
       app//org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory.java:417)
       java.base@11.0.6/java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:74)
       java.base@11.0.6/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
       java.base@11.0.6/java.io.BufferedOutputStream.write(BufferedOutputStream.java:127)
       app//org.apache.lucene.store.OutputStreamIndexOutput.writeBytes(OutputStreamIndexOutput.java:53)
       app//org.elasticsearch.common.lucene.store.FilterIndexOutput.writeBytes(FilterIndexOutput.java:59)
       app//org.elasticsearch.index.store.Store$LuceneVerifyingIndexOutput.writeBytes(Store.java:1223)
       app//org.elasticsearch.repositories.blobstore.FileRestoreContext.restoreFile(FileRestoreContext.java:262)
       app//org.elasticsearch.repositories.blobstore.FileRestoreContext.restoreFiles(FileRestoreContext.java:238)
       app//org.elasticsearch.repositories.blobstore.FileRestoreContext.restore(FileRestoreContext.java:199)
       app//org.elasticsearch.repositories.blobstore.BlobStoreRepository.restoreShard(BlobStoreRepository.java:863)
       app//org.elasticsearch.index.shard.StoreRecovery.restore(StoreRecovery.java:486)
       app//org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromRepository$5(StoreRecovery.java:287)
       app//org.elasticsearch.index.shard.StoreRecovery$$Lambda$4390/0x0000000840e41c40.run(Unknown Source)
       app//org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:310)
       app//org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:285)
       app//org.elasticsearch.index.shard.IndexShard.restoreFromRepository(IndexShard.java:1747)
       app//org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$10(IndexShard.java:2417)
       app//org.elasticsearch.index.shard.IndexShard$$Lambda$4389/0x0000000840e41840.run(Unknown Source)
       app//org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681)
       java.base@11.0.6/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
       java.base@11.0.6/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
       java.base@11.0.6/java.lang.Thread.run(Thread.java:834)
     8/10 snapshots sharing following 22 elements
       java.base@11.0.6/java.lang.Thread.sleep(Native Method)
       java.base@11.0.6/java.lang.Thread.sleep(Thread.java:339)
       app//org.apache.lucene.store.RateLimiter$SimpleRateLimiter.pause(RateLimiter.java:153)
       app//org.elasticsearch.index.snapshots.blobstore.RateLimitingInputStream.maybePause(RateLimitingInputStream.java:52)
       app//org.elasticsearch.index.snapshots.blobstore.RateLimitingInputStream.read(RateLimitingInputStream.java:71)
       java.base@11.0.6/java.io.FilterInputStream.read(FilterInputStream.java:107)
       app//org.elasticsearch.repositories.blobstore.FileRestoreContext.restoreFile(FileRestoreContext.java:261)
       app//org.elasticsearch.repositories.blobstore.FileRestoreContext.restoreFiles(FileRestoreContext.java:238)
       app//org.elasticsearch.repositories.blobstore.FileRestoreContext.restore(FileRestoreContext.java:199)
       app//org.elasticsearch.repositories.blobstore.BlobStoreRepository.restoreShard(BlobStoreRepository.java:863)
       app//org.elasticsearch.index.shard.StoreRecovery.restore(StoreRecovery.java:486)
       app//org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromRepository$5(StoreRecovery.java:287)
       app//org.elasticsearch.index.shard.StoreRecovery$$Lambda$4390/0x0000000840e41c40.run(Unknown Source)
       app//org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:310)
       app//org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:285)
       app//org.elasticsearch.index.shard.IndexShard.restoreFromRepository(IndexShard.java:1747)
       app//org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$10(IndexShard.java:2417)
       app//org.elasticsearch.index.shard.IndexShard$$Lambda$4389/0x0000000840e41840.run(Unknown Source)
       app//org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681)
       java.base@11.0.6/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
       java.base@11.0.6/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
       java.base@11.0.6/java.lang.Thread.run(Thread.java:834)
   
    6.1% (30.3ms out of 500ms) cpu usage by thread 'elasticsearch[W7LJGMn][generic][T#8]'
     10/10 snapshots sharing following 22 elements
       java.base@11.0.6/java.lang.Thread.sleep(Native Method)
       java.base@11.0.6/java.lang.Thread.sleep(Thread.java:339)
       app//org.apache.lucene.store.RateLimiter$SimpleRateLimiter.pause(RateLimiter.java:153)
       app//org.elasticsearch.index.snapshots.blobstore.RateLimitingInputStream.maybePause(RateLimitingInputStream.java:52)
       app//org.elasticsearch.index.snapshots.blobstore.RateLimitingInputStream.read(RateLimitingInputStream.java:71)
       java.base@11.0.6/java.io.FilterInputStream.read(FilterInputStream.java:107)
       app//org.elasticsearch.repositories.blobstore.FileRestoreContext.restoreFile(FileRestoreContext.java:261)
       app//org.elasticsearch.repositories.blobstore.FileRestoreContext.restoreFiles(FileRestoreContext.java:238)
       app//org.elasticsearch.repositories.blobstore.FileRestoreContext.restore(FileRestoreContext.java:199)
       app//org.elasticsearch.repositories.blobstore.BlobStoreRepository.restoreShard(BlobStoreRepository.java:863)
       app//org.elasticsearch.index.shard.StoreRecovery.restore(StoreRecovery.java:486)
       app//org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromRepository$5(StoreRecovery.java:287)
       app//org.elasticsearch.index.shard.StoreRecovery$$Lambda$4390/0x0000000840e41c40.run(Unknown Source)
       app//org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:310)
       app//org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:285)
       app//org.elasticsearch.index.shard.IndexShard.restoreFromRepository(IndexShard.java:1747)
       app//org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$10(IndexShard.java:2417)
       app//org.elasticsearch.index.shard.IndexShard$$Lambda$4389/0x0000000840e41840.run(Unknown Source)
       app//org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681)
       java.base@11.0.6/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
       java.base@11.0.6/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
       java.base@11.0.6/java.lang.Thread.run(Thread.java:834)
   
    5.0% (24.9ms out of 500ms) cpu usage by thread 'elasticsearch[W7LJGMn][generic][T#19]'
     10/10 snapshots sharing following 22 elements
       java.base@11.0.6/java.lang.Thread.sleep(Native Method)
       java.base@11.0.6/java.lang.Thread.sleep(Thread.java:339)
       app//org.apache.lucene.store.RateLimiter$SimpleRateLimiter.pause(RateLimiter.java:153)
       app//org.elasticsearch.index.snapshots.blobstore.RateLimitingInputStream.maybePause(RateLimitingInputStream.java:52)
       app//org.elasticsearch.index.snapshots.blobstore.RateLimitingInputStream.read(RateLimitingInputStream.java:71)
       java.base@11.0.6/java.io.FilterInputStream.read(FilterInputStream.java:107)
       app//org.elasticsearch.repositories.blobstore.FileRestoreContext.restoreFile(FileRestoreContext.java:261)
       app//org.elasticsearch.repositories.blobstore.FileRestoreContext.restoreFiles(FileRestoreContext.java:238)
       app//org.elasticsearch.repositories.blobstore.FileRestoreContext.restore(FileRestoreContext.java:199)
       app//org.elasticsearch.repositories.blobstore.BlobStoreRepository.restoreShard(BlobStoreRepository.java:863)
       app//org.elasticsearch.index.shard.StoreRecovery.restore(StoreRecovery.java:486)
       app//org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromRepository$5(StoreRecovery.java:287)
       app//org.elasticsearch.index.shard.StoreRecovery$$Lambda$4390/0x0000000840e41c40.run(Unknown Source)
       app//org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:310)
       app//org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:285)
       app//org.elasticsearch.index.shard.IndexShard.restoreFromRepository(IndexShard.java:1747)
       app//org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$10(IndexShard.java:2417)
       app//org.elasticsearch.index.shard.IndexShard$$Lambda$4389/0x0000000840e41840.run(Unknown Source)
       app//org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681)
       java.base@11.0.6/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
       java.base@11.0.6/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
       java.base@11.0.6/java.lang.Thread.run(Thread.java:834)

couldn't post everything here, so I edited the question with more information.

So I have found out that actually, there was some erroneous metric coming through the dashboard and there was indeed 40Mb/s being utilised for the recovery.

However I had changed this setting to be 250Mb and believe there is plenty of bandwidth to utilise such speeds. This seems to be capping at the indices.recovery.max_bytes_per_sec=40Mb default value, even though my settings are the following:

{
  "persistent": {
    "cluster": {
      "routing": {
        "allocation": {
          "node_concurrent_recoveries": "10",
          "node_initial_primaries_recoveries": "20"
        }
      }
    },
    "indices": {
      "recovery": {
        "max_bytes_per_sec": "250mb",
        "max_concurrent_file_chunks": "5"
      }
    },
    "xpack": {
      "monitoring": {
        "collection": {
          "enabled": "true"
        }
      }
    }
  },
  "transient": {}
}

It's like the new settings have not been applied.

I have read that the compression significantly affects network usage and throughput speed: Elasticsearch 6.3.0 shard recovery is slow .
Furthermore, I have also noticed that there was a setting called, indices.recovery.compress which could also be affecting the network speed : Transport.tcp.compress slowing down shard relocation

However, I can not find this setting anymore on version 6.8. Have these been deprecated or replaced with some other parameter?

How would I go about disabling compression in elastic 6.8?

Thank you in advance

This is being limited by the repository setting max_restore_bytes_per_sec not indices.recovery.max_bytes_per_sec.

However this does not explain why there's a 3-hour delay in between requesting the restore and seeing any network traffic at all.

There was a problem in the way we were querying the metrics for elasticsearch and we were indeed using 40Mb/s.

That sounds like exactly what my limit might be. I attempted to change the existing repo settings, but this resulted in :

{
"error": {
"root_cause": [
{
"type": "remote_transport_exception",
"reason": "[W7LJGMn][127.0.0.1:9300][cluster:admin/repository/put]"
}
],
"type": "illegal_state_exception",
"reason": "trying to modify or unregister repository that is currently used "
},
"status": 500
}

Is there a way of transfering an existing repo to another while changing the settings?

As long as there's not currently any snapshots/restores ongoing you can remove and re-add the repository with different settings. Removing the repo from Elasticsearch doesn't remove its contents.

I stand corrected, as long as there's no snapshots/restores currently ongoing then you can PUT the same repository with updated settings. There's no need to remove it from Elasticsearch first.

I removed it before reading this message but it's very easy anyway, thank you so much for your help , I can now reach speeds up to 200mb/s !!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.