How to specify routing for index transform job

Hello,

  1. we have an index transform job, created as below (shortened for the sake of brevity)
PUT /_plugins/_transform/my_test_transform
{
  "transform": {
    "enabled": true,
    "schedule": {
      "interval": {
        "period": 1,
        "unit": "Minutes",
        "start_time": 1602100553
      }
    },
    "description": "create index my_test_transform",
    "source_index": "my_test_src_index",
    "target_index": "my_test_target_index",
    "page_size": 1,
    "groups": [
      {
        "terms": {
          "source_field": "message_header.global_tenant_id",
          "target_field": "message_header.global_tenant_id"
        }
      }
    ],
    "aggregations": {
      "usage_data.data_read_kb": {
        "sum": {
          "field": "usage_data.data_read_kb"
        }
      },
      "usage_data": {
        "scripted_metric": {
          "init_script": "state.docs = [];",
          "map_script": """ 
          
            Map span = [
              'occurred_message_timestamp_utc_ms':doc['occurred_message_timestamp_utc_ms'].value
            ];
            state.docs.add(span);
          """,
          "combine_script": "return state.docs;",
          "reduce_script": """ 
            def ret = new HashMap();
            ret['device_friendly_name'] = 'test123';
            return ret;
          """
        }
      }
    }
  }
}
  1. Both source and target indexes contain multi tenant documents, so we use document field "message_header.global_tenant_id" as routing value.
    The source and target indexes both have an index template with mapping that specifies:
    "_routing": {
		"required": true
	}
  1. when running the transform job it fails with exception "RoutingMissingException"
  2. to verify this is the root cause, we removed the "routing = true" from target index template, and now the transform job completes successfully!!
  3. However we would like to keep "routing = true" in target index, so that tenant documents in target index are located in same shard (better for search performance)

How do we accomplish this?

Thanks,
Assaf

Welcome to our community! :smiley:

This isn't Elasticsearch, as the transforms API is actually _transform/. What are you using here that is a plugin?

thank you!
you are right, we are currently running a few POCs most of them on elastic and others on opensearch. I accidently pasted the opensearch example.

How would this be accomplished this on Elasticsearch?
I assume we need to somehow update the painless script above to use field "message_header.global_tenant_id" as routing value?
or maybe update the target index template, which we create in advance, to use "message_header.global_tenant_id" as the default routing field?

thanks,
Assaf

Elasticsearch transform has a different implementation to the opensearch one or rather vice versa. Anyway, you find the docs for Elasticsearch transforms here. Not knowing your use case, I assume the continuous mode might be of interest for your application.

ES Transform itself does not provide a configuration for _routing, however this is not a limitation, because you can use an ingest pipeline as output of an Elasticsearch transform (look for the pipeline option under dest). Ingest pipelines provide great flexibility and you can e.g. change _index or _routing using a set processor.

1 Like

thank you!
I will try using ingest pipeline as output of an Elasticsearch transform

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.