Split a data stream into separate Indexes - to allow different ILM policies

mgordon · August 20, 2024, 5:54pm

I'm using the Custom UDP Logs integration and need to split the data stream into two separate indexes (data streams), so I can enable different ILM policies on the two different types of data coming in. Is that possible, or do I need to use two different integrations coming from the same source? Suggestions?

stephenb · August 20, 2024, 6:32pm

Hi @mgordon

Have you looked at the reroute processor... with a condition... made for just such a use case

Luca_Belluccini · August 20, 2024, 6:33pm

First of all, the Custom UDP Logs and TCP Logs integrations do not do their best job into helping the end user setup the proper index template structure.

In the Fleet Integration we offer to customize the Dataset name, but the Integration will not do anything for you automatically.
It will install, by default, the Index Template logs-udp.generic regardless of the Dataset name we've chosen.

Note the Custom Logs integration is slightly different. If you set the Dataset name to my.dataset, it will automatically create for you an Index Template which will match logs-my.dataset so that you have a good starting point.

Now, back to the Custom UDP Log integration.

Is the data the same structure/type, but it's just related to different environments?
Can the data be received on different ports or hosts?

If the data is the same and you cannot receive it from different ports/hosts, then I would go with 1 integration and then route the events using an ingest pipeline.

This is based on 8.14-ish...

Approach A

Create a Custom UDP Integration with the Dataset name set to my.dataset.
I would recommend having an Index Template for a <type>-<dataset> index (e.g. logs-my.dataset Index template) completely decoupled with the udp.generic one.

To create the logs-my.dataset Index Template, you can clone the logs-udp.generic-* (and all the associated component templates)

Index Template logs-my.dataset (cloned from logs-udp.generic, with index pattern logs-my.dataset-*)
- Component Templates:
  - logs@mappings
  - logs@settings
  - logs-my.dataset@package, cloned from logs-udp.generic@package, but:
    - replace the default_pipeline with logs-my.dataset - then create an empty ingest pipeline
    - define the type of the fields you will expect in the mappings section
  - logs-my.dataset@custom
  - ecs@mappings
  - .fleet_globals-1
  - .fleet_agent_id_verification-1

Once you have this, you can clone the Index Template logs-my.dataset into:

Index Template logs-my.dataset-namespace1 with index pattern logs-my.dataset-namespace1 and custom ILM Policy n1
Index Template logs-my.dataset-namespace2 with index pattern logs-my.dataset-namespace2 and custom ILM Policy n2

How do you route the events to different datastreams?
Using the logs-my.dataset ingest pipeline with the reroute processor. You can reroute events to a different namespace using a conditional.

This should somewhat ensure this will not break on future changes as here we're literally decoupling the behavior from the Custom UDP Logs assets.

Approach B

If instead you are ok with keeping udp.generic as Dataset name, then you can:

Clone the Index Template logs-udp.generic into 2 Index Templates
- Index Template logs-udp.generic-namespace1 with index pattern logs-udp.generic-namespace1 and custom ILM Policy n1
- Index Template logs-udp.generic-namespace2 with index pattern logs-udp.generic-namespace2 and custom ILM Policy n2
Define an ingest pipeline logs-udp.generic@custom with the reroute processor to route to namespace1 or namespace2

Approach A allows you to have your own "assets".

If you have to modify the Index Template, you have to do it 3 times.

Approach B makes you dependent on all the assets of upd.generic which can be an advantage/disadvantage.

If you have to modify the Index Template, you have to do it 3 times.
If the Index Template structure get changed by Fleet (enhancements, etc...), you might need to re-clone to align with the new Index Template of the Custom UDP

mgordon · August 20, 2024, 7:21pm

I hadn't looked at that Reroute processor - it does exactly what I need! I tried to update the dataset or namespace manually in the pipeline, bit it failed becuase the value wasn't allowed in the constant-keyword datatype.

Thank you!

mgordon · August 20, 2024, 7:22pm

Thank you - I'm essentially using Approach B, but hadn't seen the Reroute processor, which was the missing piece.

Thank you!

Topic		Replies	Views
Custom Logs integration on Fleet: Select custom index Elasticsearch fleet , integrations	2	1025	December 28, 2021
Elastic agent can't reroute docs to a different index Elastic Agent integrations	5	269	February 27, 2024
Elastic-Agent - Custom Log Integration Beats elastic-agent	4	3285	December 17, 2020
How to change a template of integration stream? Elasticsearch datastreams	0	12	March 17, 2025
Elastic Agent, single custom log to multiple indices via ingest pipeline Elasticsearch ingest-pipeline	7	1127	December 22, 2021

Split a data stream into separate Indexes - to allow different ILM policies

Approach A

Approach B

Related topics