Possible to temporarily lock cluster from re-allocating?


(ppearcy) #1

Hey,
Don't think this is possible, but wanted to check. There is a
performance hit when we remove a machine from the cluster due to shard
reallocation and cold shards coming online. I currently have shard
allocations throttled, with concurrent_recoveries: 1 and
concurrent_streams: 1.

Our response times in a steady state are 25-30ms with maxes at 1-2
seconds. During shard reallocation, this increases by 10x, which isn't
bad, but my clients have gotten spoiled.

What I'd like to do is send a command to the cluster to tell it not to
re-allocate any shards for some amount of time while a server is taken
offline for maint, since I know that any re-allocations will pretty
much be wasted work. Is this doable in 16.2 and if not would others
consider this a viable feature?

This would mitigate the performance hit when the machine is removed,
but I'd still expect some hit when it is added back.

Thanks,
Paul


(Shay Banon) #2

Agreed!, I heard it from several users as well... . Its tricky business, but there should be an option for that. Open an issue (I don't think there is one). Requires some thinking, but doable.

On Friday, July 8, 2011 at 8:45 PM, Paul wrote:

Hey,
Don't think this is possible, but wanted to check. There is a
performance hit when we remove a machine from the cluster due to shard
reallocation and cold shards coming online. I currently have shard
allocations throttled, with concurrent_recoveries: 1 and
concurrent_streams: 1.

Our response times in a steady state are 25-30ms with maxes at 1-2
seconds. During shard reallocation, this increases by 10x, which isn't
bad, but my clients have gotten spoiled.

What I'd like to do is send a command to the cluster to tell it not to
re-allocate any shards for some amount of time while a server is taken
offline for maint, since I know that any re-allocations will pretty
much be wasted work. Is this doable in 16.2 and if not would others
consider this a viable feature?

This would mitigate the performance hit when the machine is removed,
but I'd still expect some hit when it is added back.

Thanks,
Paul


(system) #3