Rollovers always done on the same node


I'm working on a small 7.15 cluster (8 data nodes named data-node-X), and for some time, I've observed an abnormal behaviour I can't find the origin of.

Indeed, all new rollovers of major indexes (ILM managed) are done on data-node-4, which receives therefore almost all the cluster indexing traffic from my 8 inserters (logstash instances on containers).

My cluster became unable to cope up as 7 nodes are merely doing nothing and one taking the whole job on its shoulders.

I decided to add some routing on my indexes templates, which made the cluster stable again, but it clearly is a duct tape, as the rollovers go back on data-node-4 as soon as I remove the routing.

My question is: how can I debug this?

Some architecture information:

  • Each data node is paired with another on a physical node: node 1 and 2 are on the same host, 3 and 4 on another host, and so on.
  • Each node has 10 dedicated 1.8 TB disks, exposed through MDP (so, data directories on which the disks are mounted).
  • Indexes are ILM-managed, and the templates (legacy templates btw) are currently containing routing allocation directives to avoid all new rollovers to be on the same node.
  • All ILM policies rely on a single hot phase, and then data are removed (after 16 to 30 days depending on the policies)
  • Despite the routing allocation hints containing each time 4 nodes, any new rollover is always done on the same node (but no longer data-node-4 for those where it is not included)
  • There is a correlation with our 7.15 upgrade, but I can't be sure it's really connected.

Thanks in advance for your hints or questions! :slight_smile:

Are you using tiering?
Why not do that and use a template to make sure things are more evenly allocated?

Are you referring to hot/warm/cold, with "tiering"?

I'm not relying on it because with only 8 identical data nodes, we saw no gain when we used it for around 6 months, and also because I want all my nodes to potentially index data.

If you're referring to something else, I'm all ears (or eyes, in this specific case)! :slight_smile:

It was my understanding that elastic is supposed evens the load and allocation at rollover, deciding what nodes will hold the new index relying on the current data nodes load and disk free space. Am I wrong?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.