Hey all,
I was thinking of writing an ES plugin that could use defined off-peak
times to analyse indices in the cluster and automatically optimize them
based on a policy configuration (# segments, segment size etc). This way
an operator could defer the cost of optimizing to a defined quiet time of
the day to minimise the IO/CPU impact.
Firstly, I wonder if anyone else thinks this would be useful or maybe even
I've missed an existing cool ES feature that does something similar, but
for us, we leave the merge policy as is and review the _segments API and
periodically fire off an _optimize call with max_num_segments=1, and we try
to do this off peak to reduce impact. This seemed like a generally useful
feature to all ES users, so thought about a plugin for everyone to be able
to use.
This whole topic then brought up the problem of a plugin that really should
only be run 'once' - it's a cluster-wide singleton service. This previous
mail topic was of interest here:
This is basically the same sort of thing I need. I've started delving into
the River code and I'm really a tad confused about where exactly within the
River code is mandating this cluster-wide bit.. ? Can someone point the
finger at it exactly? I've been trying to grok the RiversService and the
way ClusterStateChange events are handed about, and I sort of get the gut
feel that this is the way that a node is 'elected' to be the river
controller. If I read things right, only when the local node is the master
does the River run on this node, is that true?
Secondly while this could be implemented 'as a River', it sort of doesn't
'sound like what a River does'. Maybe ES needs a higher level abstraction
of a Cluster Wide Singleton Plugin, of which the River Service would be one
example. This way other plugins could leverage the infrastructure (like my
one).
Again, I could have missed something in the code. I'm hoping people with
experience could help me point to the right way to do this, and indeed,
whether it's worth even considering embarking on.
regards,
Paul