Transform exclude nodes

I'm looking for a way to exclude nodes from transform searches:

Cluster has a cold/warm architecture.
The transform checks the last date of data for each client (terms by client> max date), in order to alert for delays.
I noticed that I see queries (in the Logs page in the elastic cloud console) which are running on the warm node, when there isn't a chance to find latest documents there.
Is there a way to exclude the warm storage node from the transform's search?


I guess you mean cold node? Cold nodes should hold old/outdated data, warm nodes should have recent data.

The problem you describe is a conceptual one, it's not possible for transform to know, that you are only looking for the last document. I guess you are using a scripted_metric with a similar implementation to what the docs provide? Transform treats the script aggregation as any other ordinary aggregation.

However, we are aware of this problem, latest_doc/state is one of the top asks for transform. We are looking into possibilities to better support this use case.

But there is a workaround. After transform has created the 1st checkpoint (or even before, if you do not care about historic state), you can update the query and put in a range query with date math to filter out old data:

"query": {
    "range": {
      "timestamp": {
        "gte": "now-1d"

For this example we only allow 1 day old data. You can tweak this to your needs and align it with your setting for frequency. The value should at least be delay + frequency. Note: such a range query can be dangerous, if the lower bound is to low, transform skips over documents and produces wrong data.

To confirm the approach, you can run queries in dev console with the suggested range query manually. In the output you should see skipped_shards, this tells you, if it worked. Shards are skipped in the can match phase of query execution, with other words: The coordinating node prunes the set of shards according to the range query, for pruned shards it won't forward the search request to e.g. a cold node that holds that shard.

Hope this helps.

Thanks a lot, i updated the query as suggested.
It a regular group by with max aggregation on the date field
Yes, warm node

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.