Can't start any pivot transform - getting allocation explanation error without reason

Hi!

Something strange happened with our Elasticsearch v7.17 cluster - I can't start pivot transforms due to an error
{"root_cause":[{"type":"status_exception","reason":"Could not start transform, allocation explanation [Not starting transform [test-transform], reasons []]"}],"type":"status_exception","reason":"Could not start transform, allocation explanation [Not starting transform [test-transform], reasons []]"}

I tried to create a few pivot transforms, with different sets of parameters and different sources, but always got the same error. I guess, this happened after I removed the oldest and weakest Elasticsearch node from cluster. I did this because for unclear reasons, transforms always ran on that node, despite that in cluster available a few much more powerful nodes with "transform" role. Before removing that old node from the cluster, I migrated data to other nodes, then used node shutdown API with "type": "remove" option. Later, I tried to return the old node to the cluster, but this didn't help.
I checked Elasticsearch logs on nodes but found nothing useful. Also, I checked common cluster issues, described in Fix common cluster issues | Elasticsearch Guide [7.17] | Elastic, but no obvious bottleneck was found.
I'm stuck and have no idea how to debug this further.

Thanks for your attention!

Hi @Django,

Can you check if one of your node has the transform role ? You can do this by running in the Dev console :

GET _cat/nodes

and checking if a node has the letter "t" in its roles.

If that's not the case, you should probably configure a node to be a transform node. Check this page of our documentation to do so.

Hope that helps

Thanks for the response, @greco ! They all have this role.

ip              heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
139.x.x.x            15          99   0    0.05    0.05     0.07 hirstw    -      elk8
139.x.x.x            58          99  18    4.50    4.85     4.86 chimrstw  *      elk5
138.x.x.x           46          96   5    2.75    2.94     4.75 hirstw    -      elk7
130.x.x.x             50          90   0    0.41    0.23     0.29 himrstvw  -      elk3
139.x.x.x            37         100   7    2.46    2.66     2.83 himrstw   -      elk6

IDK why, but before I removed oldest node from the cluster, transforms complained on start "Could not start transform, allocation explanation [Not starting transform [test-transform], reasons [node_id:not a transform node]]"},"status":429} - and that was for every node in cluster, that doesn't have "transform" role yet. So, I forced to assign this role to all nodes.

From the transforms code, it seems reasons[] would be empty if it cannot find any nodes in the cluster.

Going off of this guide, are there any errors or anything in the logs that indicate the cluster in unstable?

1 Like

Thank you, @Patrick_Whelan for advice! Unfortunately, I found no signs of cluster instability - no repeated master elections (except when I restarted the master), no node join failures, and no flapping node connections... Your advice prompted me to check "discovery.seed_hosts" lists in nodes' elasticsearch.yml - seed_hosts indeed were partially outdated. I deleted from the "discovery.seed_hosts" non-existed any more nodes, updated them to the current list of nodes, and restarted nodes accordingly - nothing changed, test transform still can't start, the error is the same - empty reasons[]. Are there any other ways to troubleshoot this transform failure?