Ability to create multiple datastreams with Elastic integrations?

Hello everyone,

I'm currently exploring Elastic integrations and have a question regarding the creation of multiple datastreams. I noticed that in the integration example with Cisco ISE, there are about ten Ingest Pipelines, but it appears that all the data is sent to a single datastream.

I was wondering if it's possible to generate a separate datastream for each pipeline. This configuration could provide more flexibility in managing and analyzing data at a more granular level.

I'm also open to alternative solutions or ideas that could allow for the creation of multiple datastreams using Elastic integrations. I would love to hear about your experiences and suggestions on this matter.

Thank you in advance for your contribution!

Hello @yago82 , so there is a few options here, let me first provide a quick explanation.

Elastic Integrations usually always create a single datastream per type of product, this is because if we would split this up into different datastreams, you would have to manually configure a different listening port for each type of log, as they cannot run on the same listening port (as with most software), this is because each datastream runs its own instance of filebeat under the hood.

This is usually to provide a set of defaults with the minimum amount of hassle needed when setting it up, however there is always options depending on what an advance user might want to do, here is a few of them:

Option 1:
If your goal is to simply analyze the data, but you only want to analyze a specific type, the Cisco ISE data already provides that with cisco_ise.log.category.name, this field is unique to each data type, so in case you want to have a custom dashboard, use discover or maybe create some sort of SIEM rule that matches a specific data type, then using this field should be sufficient for your case.
The possible values should be:
'CISE_Policy_Diagnostics' 'CISE_Guest' 'CISE_MyDevices' 'CISE_Internal_Operations_Diagnostics' 'CISE_Threat_Centric_NAC' 'CISE_Posture_and_Client_Provisioning_Audit' 'CISE_RADIUS_Accounting' 'CISE_Failed_Attempts' 'CISE_Passed_Authentications' 'CISE_RADIUS_Diagnostics' 'CISE_AD_Connector' 'CISE_Authentication_Flow_Diagnostics' 'CISE_Administrative_and_Operational_Audit' 'CISE_System_Statistics' 'CISE_TACACS_Accounting' "CISE_Identity_Stores_Diagnostics"

Option 2: If you want to split them up because you want to store different logs for different lengths of time with some lifecycle management (ILM or DLM) - When you configure a integration, you can also fill out a field called "namespace". This is a custom name that is appended to the end of your datastream (the default is named default).
You can run multiple instances of the same integration on the same Elastic Agent, and if you are on newer versions of the stack, it will still run only a single process of filebeat underneath, though they would need to have different listening ports.
Simply go to your integration policy, and add let's say the Cisco ISE integration 3 times, but with different listening ports for the incoming syslog data.
You can then go to your Cisco ISE web interface, configure data forwarding, but handpicking the type of data that you want in your filter, and send them to the appropriate ports.
Since you added a different namespace to each of the 3 instances of the Cisco ISE integration, they are added to 3 different datastreams in theory. You can then add ILM policies that matches the namespace you created.
However if your usecase is simply to differentiate between the data, I would highly recommend just going with Option 1. Datastreams themself is not really necessary to be used as a layer to separate data just for the usecase of visualizations and such.

Option 3: This one is much more advance, and I will add it here simply for a reference, if you feel some of these concepts are unknown to you, I would not recommend it just yet.
Each integration when installed creates a few things, multiple component templates that has everything it needs like field mappings etc, a index template and a datastream. It also creates the built-in ingest pipelines + a pipeline that is meant for users to extend the integration, the name ends with @custom.

You can create your own datastreams manually, with its own index template, but reuse the component templates which are used by the index template created automatically by the integration. This allows you to reuse all the settings and field mappings etc, this also ensures it survives updates.

Then in the custom ingest pipeline we talked about above, if you are on a much newer version of the stack (8.9+), you can use a reroute processor for each log type: Reroute processor | Elasticsearch Guide [8.10] | Elastic
This reroute processor should route the data to any of your manually made datastreams depending on the log type, this can use the same IF conditions that we use, when we specify which ingest pipeline the data needs to go through: https://github.com/elastic/integrations/blob/main/packages/cisco_ise/data_stream/log/elasticsearch/ingest_pipeline/default.yml#L78

Again, the latter is a much more advance case, and really recommended only for users that has a more niche usecase and are already comfortable with adding their own custom index settings.

Hi Marius,

Thank you so much for your valuable advice!

Your suggestion regarding the third option sounds quite intriguing. However, since the solution is currently in tech review on Elastic's side, we've decided to wait for it to become stable within Elastic before implementing it.

Once again, thank you for your insights and guidance. Your input has been incredibly helpful!

On my experience with the Elastic Agent the final result would be the inverse of what you want, have each pipeline into a different data stream makes everything way, way harder to manage.

For example, if you have each pipeline turned into a different data set to have its own data stream and you want to have a custom ingest pipeline, you would need to edit multiple custom pipelines, if you have 10 data streams, then it would need 10 custom ingest pipelines.

Another example, if you want to add a custom field and you have 10 data streams, you would need to edit 10 templates.

And another issue is that not all data stream would have the same amount of data, you will end up with some small indices.

Personally I find this approach of over granularization of the data pretty bad for managing every thing.

I would say that using the category name as a filter as suggested by Marius is a good approach, it is what we do here with ISE, but we are not using the Elastic Agent integration in this case.

Hi Leandro,

I understand and appreciate your considerations. Unfortunately, there is a requirement for three different retentions based on the type of a specific field. Despite our recommendations against it, we need to move in this direction.

"Give the devil his due" :slight_smile:

Thank you.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.