Shard routing

Hello,

TL; DR;
How to route all indices to some nodes except some other indices to other dedicated nodes ?
Or... explain me why am I on the wrong way :slight_smile:


We use ELK for a few months mostly for logging centralization and very happy with it.

We recently added some metrics from various modules and since then we've seen a significant increase of response time and search durations. Even the navigation within kibana is quite slower (the pages take a bit longer to load).

It is still acceptable and I our end-users doesn't yet complains but we plan to add more and more logs and metrics (and even ML..) in the near future.

Therefore we want to make sure our infrastructure will handle the load as it increases.

Our guess is that the load that our metricbeats generates on the data nodes slows them down.
We would like to split our data nodes into 2 groups:

  • group 1 (default): all indices (including system indices) goes on these nodes
  • group 2 : only some indices that we chose (metricbeat for the moment but some other that are as intensive as metricbeat later)

For testing purposes I setup a small cluster with 4 nodes:

  • 2 nodes with this attribute : node.attr.indexLoad: heavy
  • 2 nodes without any attribute

Then I setup the cluster setting:

PUT _cluster/settings
{ "persistent.cluster.routing.allocation.exclude.indexLoad": "heavy" }

Then in the metricbeat index template I added this setting (because I thought there might be an "override" mechanisms at index level that overrides what is set at the cluster level) :

{ "index.routing.allocation.include.indexLoad": "heavy" }

But now my metricbeat indices are not allocated anymore because they have both

{ "persistent.cluster.routing.allocation.exclude.indexLoad": "heavy" }
and
{ "index.routing.allocation.include.indexLoad": "heavy" }

I'm now pretty convinced that I'm on the wrong way but I don't really understand why ?
Could please help me understanding ?

Thanks

Basically this is what we call hot/warm, but you have your own tiering definitions.

Here you are telling the entire cluster that no indices should be on those nodes. That's not what you want.

That is what you need to define in the template, not the other one.

I already read a bit about the hot-wram-cold architecture but my understanding is that this is not addressing the some problem (but once again I maybe misunderstanding and feel free to explains me where I'm wrong).

I read again a bit this page before this answer : Hot-warm-cold architecture with Elasticsearch
I still understand that the hot-warm-cold architecture helps splitting indices into 3 sub-clusters of nodes:

  • hot nodes: read-write intensive
  • warm nodes: read(and write ?) moderate
  • cold nodes: read-only? (or write also ?) with very few activity

I also understand that the hot, warm or cold nodes are chosen based on the index life-cycle (how old the index is)

But what I want is not exactly that, I want to have system indices (kibana, elastics, etc...) and few more small indices always on nodes that have a very few load so they answer as fast as if there were no load at all on the whole cluster

If I do that, there are some other indices on these nodes so these indices will still be affected by the load on the metricbeat onces and this is what I try to avoid

Then you will need to tag every index to either hot or warm, otherwise Elasticsearch will assume you want them across all nodes.