Query performance drop while doing a snapshot

Alexander_Iliev · January 25, 2018, 1:42pm

We are running a 6-node Elasticsearch cluster with a single (not really single, but the only one that is actively used) index which is 3.5TB in size (51 shards, 25 primary).

We have a NFS share mounted on all nodes and registered as a snapshot repository.

We triggered a snapshot (the first one), but the query performance degraded extremely during the snapshot so we canceled the snapshot.

My guess is that this is expected, but what can we do to avoid or at least reduce this search performance drop?

Thank you!

dadoonet · January 25, 2018, 2:18pm

That's not supposed to be happening. Did you change any default settings ?

Alexander_Iliev · January 25, 2018, 2:26pm

Do you have any particular settings in mind?

dadoonet · January 25, 2018, 2:39pm

Yes. But share what you have. Node settings and index settings please.

Alexander_Iliev · January 25, 2018, 3:21pm

Here's the output from the GET /_nodes/stats and GET /index_name/_settings API calls.

If those are not the settings you had in mind please tell me the exact calls you want to see the output from.

Thanks!

(Note that the node, cluster and index names have been sanitized for privacy purposes.)

dadoonet · January 25, 2018, 3:42pm

Can you share your elasticsearch.yml file?

Alexander_Iliev · January 25, 2018, 3:45pm

Here you go:

# cat /etc/elasticsearch/elasticsearch.yml | grep -vE '^#'
cluster.name: elastic
node.name: node-3
path.data: /data/elastic,/elastic
path.logs: /var/log/elasticsearch
path.repo: ["/mnt/elasticsearch/snapshot/e1/es-v5"]
network.host: _eth1_,_local_
discovery.zen.ping.unicast.hosts: ["172.31.31.10","172.31.31.21","172.31.31.23","172.31.31.24","172.31.31.25"]
discovery.zen.minimum_master_nodes: 4

The rest of the nodes are configured similarly (the node name obviously differs for example).

Alexander_Iliev · January 25, 2018, 3:46pm

Just to clarify - all nodes are at version 5.6.1.

dadoonet · January 25, 2018, 5:27pm

I don't see anything obvious.

Could you run hot_threads API while the snapshot is running ? May be we can find something.

Cc @Igor_Motov

Christian_Dahlqvist · January 25, 2018, 5:34pm

What type of storage do you have? What does disk I/O and iowait look like when you are snapshotting compared to when you are not?

Alexander_Iliev · January 25, 2018, 7:41pm

I can't start a snapshot right now. I will post an update when we are able to do a snapshot (tomorrow most probably).

Igor_Motov · January 26, 2018, 2:53pm

The snapshot process should be throttled enough in order to not affect querying that much. Unless the throttling settings were change or the number of snapshot threads was changed in the thread pool settings, there shouldn't be a visible impact. We also need to consider, that even with throttling snapshot definitely adds some disk I/O, network and a little bit of memory load so if any nodes in the cluster were already very high in resource utilization, it is possible for a snapshot to be the last drop that triggers overload. I saw it a few times with S3 repo, that used to use memory for buffering, but it's not common for shared file system. So, I am a bit puzzled here. @tlrx any other thoughts?

Alexander_Iliev · January 26, 2018, 5:48pm

@dadoonet Here's the hot_threads API response.

@Christian_Dahlqvist, the IO wait on the nodes look reasonable during the snapshot - most of the time below 20% (with some peaks, but below 70%).

@Igor_Motov, that's what I expected, thanks for the clarification. The cluster was far from its limits at the time of the snapshot, so not sure why this happened.

Anyway, a snapshot is in progress right now and the performance looks normal.

I'll keep monitoring the service.

Thanks!

system · February 23, 2018, 5:48pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch snapshots throttle problems Elasticsearch	3	1698	July 6, 2017
Snapshot throttle limit problem Elasticsearch	1	634	July 6, 2017
Elasticsearch high cpu load on snapshot creation Elasticsearch	7	794	April 21, 2021
Does creating snapshot make cluster slow? Elasticsearch	6	2610	November 12, 2018
Snapshot performance very slow Elasticsearch	1	1133	October 11, 2018

Query performance drop while doing a snapshot

Related topics