How do I gracefully and quickly make minor updates to my cluster config?

If I reboot a node it takes a long time to get its shards initialized and come back up.

I now know I should disable shard allocation before rebooting which would solve that problem, but what if I just need to restart the service?

Say for example I just want to tweak the refresh interval. If I change that setting and restart the elasticsearch service (without disabling allocation or anything) will the node come up fast enough to come back online to the cluster immediately?

That is a dynamic setting on each index. You can change it using the setting API. For the default refresh interval I'd make an index template so that new indices get the refresh interval you want. Or look at whatever tool you have making the index and make it specify the refresh interval you want.

At this point the only thing we have to speed up restarts is synced flush and that only works if the index isn't being changed. That is useful but doesn't apply in all situations. We're working to make restarts faster even for indices being written to but that work isn't going to be ready until at least 6.0.

Ah I guess refresh interval was a bad example.

I have a basic ELK stack.

So this is tricky then because I can't use synced flush as the node I want to reboot is still receiving and indexing docs.

I cant really stop sending it docks because editing my logstash config would require restarting that service ( I only have one logstash server).

Im sending it a pretty brisk volume of docs (each node indexes about 600 docs per second).

Should I just run synced flush a bunch of times and restart the service as fast as I can?

I don't think that is likely to help, really. you are sending docs way faster than you can sneak a restart.

There is another more graceful but slower option. You can use the allocation filtering feature to move shards that you are actively writing off of the node that you are going to restart. I did many upgrades that way when I maintained a cluster. We didn't have synced flush and I couldn't pause writes so it was the way that made the upgrades least impactful. It is slow. At least, it is much slower than synced flush.

Beyond that you wait for sequence number based replication to come in the 6.x branch. That should make the process of nodes catching up on documents that they missed much much faster.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.