Transform nicing

ddolcimascolo · June 30, 2023, 7:07am

Hi guys,

We make use of transforms extensively in a production cluster (6 nodes of 20 CPU 32GB RAM, all nodes have all roles) with approximatively 250 transforms running in continuous mode. Some transforms have a frequency of 15m and some others have 1h (these high intervals are enough for our use case).

We experience recurring load spikes at the same moment every hour. When checking the transform stats in Kibana during the spikes I can see lots of (up to 60) transforms in state "indexing" at the same time.

While I totally understand the way the periodic scheduler works, I think it would be nice to support a "nice" mode cluster setting for transforms, which would allow Elasticsearch to automatically re-arrange transforms so that they don't run all at the same time. The workaround for now would be to stop/start transforms manually to restart the scheduler at another moment. This is cumbersome and also totally lost in the case of node restarts, because transforms will be restarted in this case, all at the same time !

What do you think of this idea? Is it worth an improvement issue on Github?

Do you have any insight about the issue we're experiencing?

Thanks,
David

przemekwitek · July 3, 2023, 7:13am

Hi,
Thanks for bringing this up!

The workaround for now would be to stop/start transforms manually to restart the scheduler at another moment.

There is another (simpler) workaround provided you're at version >= 8.7.0:
You can use _schedule_now API (docs). If you call it, the next time the transform runs will be now + frequency.

and also totally lost in the case of node restarts, because transforms will be restarted in this case, all at the same time

Yes, that's still true even with the workaround I described above.

What do you think of this idea? Is it worth an improvement issue on Github?

Sure, feel free to create a GH issue. You described the problem well in this post so you can just copy-paste the text into the issue if you like.

Do you have any insight about the issue we're experiencing?

We have been aware of this issue for some time, unfortunately we did not yet get to fix it. Unfortunately I cannot provide any guarantees about if/when this will be properly fixed.

ddolcimascolo · July 3, 2023, 7:48am

Thanks for the quick reply. We're currently using v8.5.1 but I'm looking into upgrading soon.
I'll create the Github issue.

Cheers,
David

system · July 31, 2023, 7:49am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Transforms from first principles Elasticsearch transforms	2	293	September 18, 2023
Transform kills the whole cluster Elasticsearch	4	467	July 27, 2020
Transform - Continuous mode for more than 1 index? Elasticsearch	12	2343	August 3, 2020
Transform stability Elasticsearch elastic-stack-monitoring , transforms	20	1495	August 2, 2021
Spike in cluster CPU when viewing Discover page on Kibana Elasticsearch	1	336	August 14, 2018

Transform nicing

Related topics