How do I setup pipelines with no direct access from Filebeat to Elastic?

Post my company being aquired, we have been asked to migrate from our on-prem Splunk to our parent company's AWS ELK. This is all new to me, so I'm needing some help knowing the right path forward.

They are having us use Filebeat and Winlogbeat to send logs to a public Logstash server, which then sends on to a Kafka topic, which is then pulled in by a second Logstash server, which is in turn indexed by Elasticsearch (pretty sure I got that flow right). We have data going in successfully, however, it's really ugly. I'm trying to understand how to get logs into it with proper field extraction/ingest parsing for various logs like Apache, Cisco network devices, and system logs.

As a specific example, I'm using the Filebeat Cisco module for our firewalls (which is working) and was trying to understand pipelines and dashboards. It appears that since we can't talk directly to their Elasticsearch instance that the filebeat setup --pipelines won't work. Is there another way to get the pipeline in place so that we can get the ingest parsing magic from these modules? Similarly the dashboards (if they're worth using)?

Thanks for any illumination that you can shed on this!

To use the ingest pipelines and dashboards from the modules they need to be installed, this is done by running filebeat setup and this needs to be done every time there is an update to filebeat or any of the integrations.

The filebeat setup can be executed by any filebeat, it doesn't need to be executed by the same filebeat that will collect logs, but you will need to run this command with a filebeat that can talk directly to Elasticsearch to install the ingest pipelines and dashboards.

After that, you will need to tell the Logstash sending data to Elasticsearch to use the ingest pipeline with the pipeline option in the Elasticsearch output.

All this is explained in this documentation.

But there is a catch, this works pretty easily if you use the default indices names and have the following scenario:

[ source ] --> beats --> logstash --> elasticsearch

In your case you have this:

[ source ] --> beats --> logstash --> kafka --> logstash --> elasticsearch

And this makes things a little more complex.

First, the logstash output in the documentation uses some @metadata fields that are created by the beats input, the @metadata fields are not part of the output message, so they will not be present in the Kafka message that the second Logstash consumes.

This can be solved in a couple of ways, you could use a mutate filter to copy the contents of the @metadata field to another field in the first logstash before the kafka output, or you could hardcoded the name of the pipeline and indices in the second logstash.

Another issue is that the Ingest Pipelines expects the source message to not be changed, but since you have a logstash sending to a kafka and another logstash, you will probably need an extra json filter in the second logstash.

What you want can be done, but need some small changes on your data ingestion flow.

Thank you so much @leandrojmp - this is really helpful and gives me a good push in the right direction.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.