Why snapshot to HDFS is 20 times slower than restore from HFS?


I am trying to snapshot and restore big index with repository-hdfs-5.4.2 plugin. I got 33.3 mb/s on snapshot and got 640 mb/s on restore. Is it normal that snapshot is about 20 times slower than restore?

Is there any way to speed up the snapshot operation or explain why it is so slow?

I've changed the setting for repository with : max_snapshot_bytes_per_sec = 500mb and max_restore_bytes_per_sec = 500mb , so it shouldn't be the limit.

ES Version : 5.4.2
We have 3 physical machines, 1 master node and 4 data node for each machine.
( 3 master nodes and 12 data nodes. )

(Hai Le) #2

It is my (basic) understanding that the write operations in HDFS requires the client to acknowledge when the replica copies of the data is written to HDFS (by default 3). Inversely when you read from HDFS, it's much more efficient as it's streaming from data from the nearest data node.

This has been my experience with HDFS in general as well as using Elastic's HDFS Repository plugin.

Here is a good overview of HDFS read/write operations: https://data-flair.training/blogs/hadoop-hdfs-data-read-and-write-operations/


@haifidelity Thanks for your help :slight_smile: .
Our HDFS repo does hold the default replica factor ( which is 3 as you said). Even thought, the write speed should at least about one third of the read speed, about 640mb/s :heavy_division_sign: 3 =„Äč which is nearly 210Mb/s.

Could you share your write speed and read speed when using Elastics's HDFS Repository plugin to snapshot and restore index.

(Hai Le) #4

I don't think it's a simple as dividing the throughput by the number of replicas. There's the HDFS overhead, the overhead of each node/shard writing to the replica set, etc. Our snapshots don't occur doing normal operating hours so I'd have to monitor that to get you any data.