I am using Filebeat to collect CloudWatch logs and I have modified the ingest node pipeline to extract and index some more information from the logs. However, when Filebeat has restarted the extra processors that I added disappear and it seems the whole pipeline is overwritten. Is there a way to ensure the pipeline isn't altered when starting Filebeat?
I have also observed that when I specify a pipeline in filebeat.yml, Filebeat seems to ignore this and use the default. I define the pipeline as shown below.
I am using Filebeat to collect CloudWatch logs and I have modified the ingest node pipeline to extract and index some more information from the logs. However, when Filebeat has restarted the extra processors that I added disappear and it seems the whole pipeline is overwritten. Is there a way to ensure the pipeline isn't altered when starting Filebeat?
I have also observed that when I specify a pipeline in filebeat.yml, Filebeat seems to ignore this and use the default. I define the pipeline as shown below.
There is a little subtle magic to this I think... the default pipeline is used and overrides what you are specifying in the elasticsearch output, I believe you will need to define it in the input sections
If you are using the AWS CloudWatch input you would specify it there
See pipeline ... and I think the magic there is actually a default value for it and then if you read the little section below it says
The pipeline ID can also be configured in the Elasticsearch output, but this option usually results in simpler configuration files. If the pipeline is configured both in the input and output, the option from the input is used.
Which means the default pipeline will always override the output section.
This may not be it but take a look let me us know...
Apologies I don't have a direct answer but... Hmmm.... you up for a little debugging unfortunately I don't have an aws test bed handy
So lets take a look at with a little more verbose logging.
You can run filebeat in the foreground with these parameters, warning this will be quite verbose I would only have like cloudtrail enable so we can cut it down ... let it run untill you see it processing messages then kill it.
Run filebeat in the foreground When you run this it will dump a lot of config... if you have
filebeat -e -d "*" or ./filebeat -e -d "*" depending on how you installed you
It will tell you it is loaded the default piplines with log lines like this.. don't let that distract you, it will always say that.
So this is showing the correct pipeline. I had a look at the events and it seems they are now being properly processed. I think the previous step may have fixed it.
Thank you for resolving this, it seems the only outstanding issue is if the default pipeline is used it is overwritten every time filebeat is restarted.
Yes if you edit the default pipeline that will happen... I do think there is a way to stop that as well with setting managing the template to false (or some other setting I would need to check, and that might cause other unintended consequences) , but editing the default pipeline is probably not the best practice as there is a lot of logic to get modules back to a working state / default state. After all, by definition it is the default pipeline is just that, what you created is a custom pipeline.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.