When running snapshot, I run http://localhost:9200/_cluster/health but did not get response for more than 10 sec. I can also see when snapshot runs, machines have a lot of costs at disk/network IO.
Such a delay does not happen if I do not run snapshot.
I check _cluster/health with timeout to ensure that creating snapshot does not slow-down queries. Is it the correct way to check this? In practice will creating snapshots make queries slow down?
I imagine creating snapshots could slow down your queries. Snapshots are going to read lots of blocks off disk, and those blocks will go into the filesystem cache. Queries that might have been served from FS cache may need to go to disk instead, and then they will be competing with the snapshot process for access to disk IO. The effect would be magnified for larger documents, large result sets, and scan/scrolls.
I don't know of anything other than spreading the data onto more nodes, or increasing the RAM of each node. Both of those will improve the ratio of RAM to disk, which would lessen the impact of FS cache purges and disk contention. Better compression could help in the same way.
But first I'd want to benchmark some more to be sure that snapshot is indeed interfering with query performance. And then I'd try to make the queries less reliant on disk reads, perhaps by storing fewer fields or breaking the data up into multiple indices.
I doubt it, unless your JVM is under so much memory pressure that it has to stop the world. Doesn't seem to be the case for you.
I'd suggest running something like iostat before/during/after your snapshot. If you see IO Wait times spike during the snapshot, then any queries not served 100% from memory are going to take longer. If you regularly see IO Wait when snapshot isn't running, then you already have an overburdened storage system and you have bigger problems than snapshot.
Make sure you run iostat on the volumes that contain your shards. For example, I might run iostat -mx nvme0n1 11 to monitor how a mounted NVMe drive on my EC2 i3 instance is performing.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.