Does creating snapshot make cluster slow?

Jianzhou_Z · October 12, 2018, 4:27am

I use ElasticSearch 5.6.

When running snapshot, I run http://localhost:9200/_cluster/health but did not get response for more than 10 sec. I can also see when snapshot runs, machines have a lot of costs at disk/network IO.

Such a delay does not happen if I do not run snapshot.

I check _cluster/health with timeout to ensure that creating snapshot does not slow-down queries. Is it the correct way to check this? In practice will creating snapshots make queries slow down?

loren · October 12, 2018, 10:38pm

I imagine creating snapshots could slow down your queries. Snapshots are going to read lots of blocks off disk, and those blocks will go into the filesystem cache. Queries that might have been served from FS cache may need to go to disk instead, and then they will be competing with the snapshot process for access to disk IO. The effect would be magnified for larger documents, large result sets, and scan/scrolls.

Jianzhou_Z · October 12, 2018, 10:52pm

@loren, Do we have any good practice to follow about when to run snapshot?

I run snapshot for a running elasticsearch cluster that provides service to users. Ideally we may not want to affect users' search experience.

One option is doing this when traffic is low. But it is still possible to that traffic changes suddenly...

loren · October 12, 2018, 11:15pm

I don't know of anything other than spreading the data onto more nodes, or increasing the RAM of each node. Both of those will improve the ratio of RAM to disk, which would lessen the impact of FS cache purges and disk contention. Better compression could help in the same way.

But first I'd want to benchmark some more to be sure that snapshot is indeed interfering with query performance. And then I'd try to make the queries less reliant on disk reads, perhaps by storing fewer fields or breaking the data up into multiple indices.

Good luck!

Jianzhou_Z · October 14, 2018, 4:40pm

Will old GC be the problem of the slowdown? I made another post: Does GC at snapshot affect performance? can users force GC?

loren · October 15, 2018, 6:48pm

I doubt it, unless your JVM is under so much memory pressure that it has to stop the world. Doesn't seem to be the case for you.

I'd suggest running something like iostat before/during/after your snapshot. If you see IO Wait times spike during the snapshot, then any queries not served 100% from memory are going to take longer. If you regularly see IO Wait when snapshot isn't running, then you already have an overburdened storage system and you have bigger problems than snapshot.

Make sure you run iostat on the volumes that contain your shards. For example, I might run iostat -mx nvme0n1 11 to monitor how a mounted NVMe drive on my EC2 i3 instance is performing.

system · November 12, 2018, 6:48pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Query performance drop while doing a snapshot Elasticsearch	13	2362	February 23, 2018
Snapshot running very slowly after upgrade from 5.x to 6.x Elasticsearch	6	379	February 25, 2020
Snapshot performance very slow Elasticsearch	1	1133	October 11, 2018
Snapshot is very very slow Elasticsearch	3	1929	June 19, 2017
Elasticsearch GC pausing during snapshot Elasticsearch	1	542	January 11, 2018

Does creating snapshot make cluster slow?

Related topics