Elasticsearch Ingest Pipeline Index Routing

We currently have an Elastic-Stack with seven Nodes and about 50 different integrations running. One Feature we currently miss is the possibility to somehow route the data to a specific index.

Let's say we habe an Integration where we collect syslog data from network devices. The network devices all send the syslog data to a socket on port 514. The data contians everything from verbose firewall rules hit to system data and it is not possible to use different outputs for the services generating the syslogs on the network device.

Now we run everything into the same index where about 95% of the data contians the very verbose but still needed firewall rules hit. now several problems appear

  • Different retention requirements for the data (system vs firewall rules)
  • Displaying the data in Kibana requires to filter out the firewall rules or the system data with some top level filters in the dashboard
  • Reindexing the data requires to touch everything (sometimes filters need to be improved especially when using cisco syslogs...)

Now what would be great if there is the possibility without having to use logstash ( :slight_smile: ) conditionally routing the data to differnt indicies based on processors in the elastic pipeline or even bevore. I could even imaging having the option to add multiple indicies for a single integration and then conditionally keep/drop inside of the pipeline the logs.

Doing this with logstash is far more complex and there is the issue that data streams and ILM cannot be used in the same time. Furthermore its far simpler to use the ingest pipelines directly.

Kind Regards
Matthias

I'm not sure I get that part where this would be more complex with Logstash, it is very easy to do this with Logstash and you can use both data streams and ILM.

But you can also write to different indices using just ingest pipelines, you would need a set processor that would change the metadata field _index as described in this part of the documentation.

It would be something like this:

    {
      "set": {
        "description": "change _index",
        "field": "_index",
        "value": "value for the new index",
        "if": "your conditional"
      }
    }

You would need to add a processor like that to the ingest pipeline of the integration you want to change.

As Leandro pointed out you can rout to a specific indx in a ingest pipeline. One limitation is however that you can not add a list of indices, so if you want to send the same document to more than one index you most likely need to use Logstash.

Many thanks for the explanation.

Does the set processor routing the events to different indicies respect the concept of data streams?
When i look into the _index field in existing documents i see the backing index of the data stream.
_index: .ds-logs-cisco_ios.log-default-2022.10.12-000039

So when I point to a new index with the set processor modifying the _index field I loose the advantages of a data stream for this specific condition matching the event? Is it possible to point to an other data stream instead of an index? Or do I have to point to an index directly or even an index alias?

image

Kind Regards
Matthias

I think you just need to set the name of the new data stream in the value for the _index, then it will be sent to this data stream and elasticsearch will automatically write it to the current backing index.

Thx this makes sense. I'll try it out.

Ok the pipleine with the conditional set of the _index field to the new data stream seems almost to work but i encounter the following issue with the documents hitting the new data stream from the elastic ingest pipeline:

Pointing _index to a data stream:
{"type":"security_exception","reason":"action [indices:admin/auto_create] is unauthorized for API key id [<replacedkey>] of user [elastic/fleet-server] on indices [logs-cisco_ios_fw.log-default], this action is granted by the index privileges [auto_configure,create_index,manage,all]"}, dropping event!

Pointing _index directly to an index: (name different, created one for testing)
{"type":"security_exception","reason":"action [indices:data/write/bulk[s]] is unauthorized for API key id [<replacedkey>] of user [elastic/fleet-server] on indices [testindex], this action is granted by the index privileges [create_doc,create,delete,index,write,all]"}, dropping event!

In general there would be enough information to fix this issue but even after reading the documentation I'm not sure how to update the permissons correctly. maybe I've missed something. I've created a custom index template and initalized the data stream with
PUT /_data_stream/logs-cisco_ios_fw.log-default

Maybe the fleet-server can only write into managed data streams?
I've tried to set the managed flag in the index tempalte

resulting in

but the error still occures

I was able to solve the issue:

Create a new integration and flagged it as an "loopback-integration" without active input/listener. Now the Integration with UDP Listener logs-cisco_ios.log-default can route certain events to an other data stream from the loopback-integration logs-cisco_ios_fw.log-default. The loopback-integration (as I call it) contains the correct permissions that allow (elastic/fleet-server?) to write to the data-stream.

Maybe there is a more smooth way like setting the correct permissions but it seems as this would do the trick. Now documents can be routed to other datastreams from a single integration point :+1:

This might solve a use-case where e.g. not needing to use several ports to distinguish syslogs from different systems to different integrations. Beeing able to internally route the logs is a big + for us as it simplifies the process of certain integrations. Now we can skip logstash too :slight_smile:

(maybe it can be simplified but overall it is something like this)

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.