Question regarding "where" log data is process

Hi Team,

[ Deployment: Filebeat as kube daemonset running on dozens of nodes directly sending data to ES(AWS ES) ]. ( ie no intermediate logstash involved ). I have Not configured or specified any pipelines in my filebeat config either.

In terms of processing load, where is the data processing taking place.

  • Is the initial parsing happening at the filebeat node ?
  • Are any processors executed at ES ? ( eg: rename/drop field etc )
  • Specially where does "Groking" take place as specified in the filebeat modules ?

As I understand some processors like geo-ip/user-agent for nginx are executed on ES
as ingest pipelines.


Hey @john_eapen,

There are two main places where processing is done when using filebeat:

The same event can be processed by both pipelines, by the one defined with Filebeat processors, and the one defined in an ingest pipeline in Elasticsearch.

Some processing features are available in both places, so you can chose where it is better to do the processing. Other features are only available in one of the places. For example processors that need information about the node where Filebeat is running are only available in Filebeat processors. As you mention, other processors that need access to centralized databases as geoip or useragent, are only available in Elasticsearch.

Grok processor is only available in Elasticsearch, but the somehow similar dissect processor is available both in Filebeat and in Elasticsearch.

Filebeat modules use to combine both processing mechanisms, so they use to include both processors and ingest pipelines to handle the logs of specific services.

Hi @jsoriano

Thank you for the reply. This really helps.


This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.