I have a design question. Let's say I have 1 filebeat process to monitor X log-files. The log-data is sent to ES through a pipeline. The pipeline is anyway required for date/timestamp processing.
The data when stored in ES has to be enriched with some extra meta-data in the form of extra document fields and tags.
I see two approaches to accomplish this enrichment:
(1) Filebeat with 1 prospector per file making it possible to add the extra fields and tags immediately in the prospector configuration (hardcoding). And thus, much less work to be done in the pipeline.
(2) Filebeat with 1 prospector for all files, but then having a pipeline doing more work (aka. grok pattern matching) to construct the extra fields and tags on the fly.
I think the solution to your design will be related to your traffic and what grok you want to do, but there are always drawbacks concerning performance or flexibility.
Let's say that you have a thousand beats connected to your cluster that generate a lot of traffic and you have to apply grok expression on every event, grok expression are basically some sugar on top of a regular expression, depending on what you need to parse they can be slow and taxing more your cluster. Depending on capacity, it might slow down ingestion. You might want to test for your maximum ingestion rate with that pipeline.
Usually, FB on edge has a low memory/low CPU usage. I presume you want to add a prospector per file type (syslog, nginx), depending at how much file type we are talking about it might just be better to hardcode theses values on the prospector. Because these values are static and should never change, it easy to add the data to the event without less processing.
If you look at our module implementation, depending on the module we create more than one prospector
Thanks for that insight, it confirms a bit what I was thinking. My feeling was that I would prefer to limit load on the ES processes and keep the data (static) in FB.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.