Are ingest pipelines created by agent expected to be different from the same type of pipeline created by beats?

So, I've been assuming that the ingest pipelines created by Elastic Agent and those created by Filebeat, would be pretty much identical. I expected some differences, but nothing that would make them output different fields.

Today I found that there are at least some major differences in the Apache ingest pipelines. Specifically, the one from Elastic Agent has more grok patterns than the one from Filebeat.

The end result is that Filebeat doesn't support the same log formats that the Agent integration supports.

Filebeat's grok processor:

{
    "grok": {
      "field": "event.original",
      "patterns": [
        "%{IPORHOST:destination.domain} %{IPORHOST:source.ip} - %{DATA:user.name} \\[%{HTTPDATE:apache.access.time}\\] \"(?:%{WORD:http.request.method} %{DATA:_tmp.url_orig} HTTP/%{NUMBER:http.version}|-)?\" %{NUMBER:http.response.status_code:long} (?:%{NUMBER:http.response.body.bytes:long}|-)( \"%{DATA:http.request.referrer}\")?( \"%{DATA:user_agent.original}\")?",
        "%{IPORHOST:source.address} - %{DATA:user.name} \\[%{HTTPDATE:apache.access.time}\\] \"(?:%{WORD:http.request.method} %{DATA:_tmp.url_orig} HTTP/%{NUMBER:http.version}|-)?\" %{NUMBER:http.response.status_code:long} (?:%{NUMBER:http.response.body.bytes:long}|-)( \"%{DATA:http.request.referrer}\")?( \"%{DATA:user_agent.original}\")?",
        "%{IPORHOST:source.address} - %{DATA:user.name} \\[%{HTTPDATE:apache.access.time}\\] \"-\" %{NUMBER:http.response.status_code:long} -",
        "\\[%{HTTPDATE:apache.access.time}\\] %{IPORHOST:source.address} %{DATA:apache.access.ssl.protocol} %{DATA:apache.access.ssl.cipher} \"%{WORD:http.request.method} %{DATA:_tmp.url_orig} HTTP/%{NUMBER:http.version}\" (-|%{NUMBER:http.response.body.bytes:long})"
      ],
      "ignore_missing": true
    }
  }

Agents grok processor:

{
    "grok": {
      "field": "event.original",
      "patterns": [
        "(%{IPORHOST:destination.domain} )?%{IPORHOST:source.address} - %{DATA:user.name} \\[%{HTTPDATE:apache.access.time}\\] \"(?:%{WORD:http.request.method} %{DATA:_tmp.url_orig} HTTP/%{NUMBER:http.version}|-)?\" %{NUMBER:http.response.status_code:long} (?:%{NUMBER:http.response.body.bytes:long}|-)( \"%{DATA:http.request.referrer}\")?( \"%{DATA:user_agent.original}\")?( X-Forwarded-For=\"%{ADDRESS_LIST:apache.access.remote_addresses}\")?",
        "%{IPORHOST:source.address} - %{DATA:user.name} \\[%{HTTPDATE:apache.access.time}\\] \"-\" %{NUMBER:http.response.status_code:long} -",
        "\\[%{HTTPDATE:apache.access.time}\\] %{IPORHOST:source.address} %{DATA:apache.access.ssl.protocol} %{DATA:apache.access.ssl.cipher} \"%{WORD:http.request.method} %{DATA:_tmp.url_orig} HTTP/%{NUMBER:http.version}\" (-|%{NUMBER:http.response.body.bytes:long})"
      ],
      "ignore_missing": true,
      "pattern_definitions": {
        "ADDRESS_LIST": "(%{IP})(\"?,?\\s*(%{IP}))*"
      }
    }
  }

So far I've just investigated the apache access pipelines. But I'm guessing there could be differences for other types of pipelines.

Is this expected? If so, why?

Assuming that the Elastic plan is to replace the beats family with Elastic Agent and the Filebeat modules with Elastic Agent Integrations, I don't think that the filebeat modules are getting the same attention as the Elastic Agent Integrations and some changes on the ingest pipelines of the Integrations are not reflected on the Filebeat Modules.

I'd believe that, but the fact that Agent doesn't support hints based auto-discovery while Filebeat does kinda makes me wonder.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.