Multiple prospectors vs. More work in pipeline

domisan · January 11, 2018, 11:51am

Hi,

I have a design question. Let's say I have 1 filebeat process to monitor X log-files. The log-data is sent to ES through a pipeline. The pipeline is anyway required for date/timestamp processing.
The data when stored in ES has to be enriched with some extra meta-data in the form of extra document fields and tags.

I see two approaches to accomplish this enrichment:
(1) Filebeat with 1 prospector per file making it possible to add the extra fields and tags immediately in the prospector configuration (hardcoding). And thus, much less work to be done in the pipeline.

(2) Filebeat with 1 prospector for all files, but then having a pipeline doing more work (aka. grok pattern matching) to construct the extra fields and tags on the fly.

Which of the two designs would the most optimal ?

Thx,

pierhugues · January 11, 2018, 4:07pm

Hello @Dominik,

I think the solution to your design will be related to your traffic and what grok you want to do, but there are always drawbacks concerning performance or flexibility.

Let's say that you have a thousand beats connected to your cluster that generate a lot of traffic and you have to apply grok expression on every event, grok expression are basically some sugar on top of a regular expression, depending on what you need to parse they can be slow and taxing more your cluster. Depending on capacity, it might slow down ingestion. You might want to test for your maximum ingestion rate with that pipeline.

Usually, FB on edge has a low memory/low CPU usage. I presume you want to add a prospector per file type (syslog, nginx), depending at how much file type we are talking about it might just be better to hardcode theses values on the prospector. Because these values are static and should never change, it easy to add the data to the event without less processing.

If you look at our module implementation, depending on the module we create more than one prospector

Thanks

domisan · January 12, 2018, 8:35am

Thanks for that insight, it confirms a bit what I was thinking. My feeling was that I would prefer to limit load on the ES processes and keep the data (static) in FB.

system · February 9, 2018, 8:35am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Coupling Filebeat prospector with Logstash pipeline Beats filebeat	4	1781	December 19, 2017
Best practices for multiple instances of file beat vs. one filebeat instance with multiple prospectors Beats filebeat	5	4341	March 27, 2019
Multiple elasticsearch output configuration Beats filebeat	2	1461	August 7, 2016
Two pipelines for the same file Beats	6	3073	July 17, 2017
Working with filebeat Beats filebeat	14	2320	July 5, 2017

Multiple prospectors vs. More work in pipeline

Related topics