Snapshot performance with ES 7.10

Hi all:
We have upgraded to ES 7.10 from 6.8. After the upgrade, we are observing that the time taken for the Snapshot on 7.10 is around 3 times slower compared to the snapshots with 6.8. These are snapshots of the ES cluster to an NFS mounted share. There is no change in the configuration from 6.8 to 7.10. I was wondering if anyone has this behavior. If so, are there any settings that need to be configured in ES 7.10 to get back the previous performance?

Thanks.

Is it still slow if you move to a fresh repository?

Hi Dave,
The slowness is happening on a fresh repository as well with a new (greenfield) installation of ES 7.10.

Thanks,
-Karun

Ok, thanks for checking that.

I'm not aware of any performance regression in this area. Can you quantify it in more absolute terms? How large/complex a snapshot are you taking (data volume, number of shards, any other pertinent info) and how long does it take?

Here are the details that you are asking for:

=6.8 ES=

curl get xxxxx:9200/_cat/snapshots/SnapshotRepo_1?v&s=id
id                                    status start_epoch start_time end_epoch  end_time duration indices successful_shards failed_shards total_shards
eca95dae-1774-40be-921e-9311e5206263 SUCCESS 1609997090  05:24:50   1609997354 05:29:14    4.3m        3                12             0           12

=7.10 ES=
curl get xxxx:9200/_cat/snapshots/SnapshotRepo_2?v&s=id
id                                     status start_epoch start_time end_epoch end_time duration indices successful_shards failed_shards total_shards
eddfcb8c-02ab-4dc6-94e9-ab654a60ba51 SUCCESS 1610002940  07:02:20   1610003581 07:13:01    10.6m       3                12             0           12

How large were these indices?

Also note that snapshots will re-use previously-snapshotted data where possible, so if the repository wasn't empty then that will confound your measurements too.

The repository was empty as we trying to compare performance of 6.8 vs 71.0.
Also, there are 3 indices and only one of them is big, maybe around 6.2.GB. The other ones are in few MB or even smaller.

Thanks,
Karun

Thanks. Yes, 10 minutes does seem longer than expected to make a ~6GB snapshot. The only relevant setting I can think of is max_snapshot_bytes_per_sec which defaults to 40mb, i.e. 40MB/s, but of course that applies to both versions. There have been quite a lot of changes to how snapshots work between 6.8 and 7.10 but (as I said) I'm not aware of any that would cause such a performance drop.

Unless anyone else has better ideas I think you'll need to share some logs from every node, with these settings:

logger.org.elasticsearch.repositories: TRACE
logger.org.elasticsearch.snapshots: TRACE
logger.org.elasticsearch.cluster.service.MasterService: DEBUG

That will show a lot more detail on when things are happening and how long everything's taking.

Hi Dave, sorry for the late response. We will try with the settings you suggested and provide you the logs.

1 Like