Making Docker container logs go through the right beats module for processing?

I've successfully configured Elastic Agent to grab my Docker container logs and send them to an ingest pipeline for processing. I'm using the JSON parser to get my fields and timestamp set up.

But now I'm stuck on how to take logs from, say, an httpd container, and get them parsed out into fields like http.request.method and user_agent.name that I get from my non-dockerized httpd instances via Elastic-Agent's Apache integration. (I assume that's just code for Filebeat's apache module."

Is there an easy way? Or do I need to configure a grok parser and do it all myself?

Anyone?

Confused... Did you add the Apache Integration to your policy.. configure it and add it to the agent?

Sorry, I should have been more detailed in my original post.

I've configured a custom log integration (per this issue) to grab the logs from /var/lib/docker/containers/*/*-json.log. I added this to the custom config section:

type: container
json.expand_keys: true

So that elastic-agent will tell Filebeat to treat the logs as container logs.

Unless I'm mistaken, adding the Apache integration would only work if it could read the Apache logs from the local filesystem at /var/log/apache2|httpd. Right? It can't just intercept the output of the custom logs integration and figure out which containers are sending apache logs and which are not.

And that brings me to my question. Is there an easy way to take the httpd container logs and get them parsed out into fields the same way the Apache Filebeat module would?

I put a label on my httpd service config to give me something to check and find out what kind of container I'm running. So if I can key off that and send the logs through some process, I'd be good.

I'm just not sure what process to send them through. An ingest pipeline and a grok parser? Or is there a way to use Filebeats apache module directly?

Thanks!

Hi @jerrac Apologies but it is not clear to me what your deployment architecture is.

Your Title Says beats but you OP say Agent... which one are you running?

Is Elastic Agent Running in Docker?

Are the Containers running in Kubernetes?

Is Apache running on the Host or in a Docker Container?

Are the Apache logs in Docker Containers or Host Logs?

Agent is not in Docker. It's running as a normal service. Installed per the usual method given from the Fleet screen in Kibana.

My containers are either running in standalone Docker, or Docker Swarm.

Apache is in Docker. Specifically containers based on the php:< version >-apache image. So logs go into /var/lib/docker/containers/*/*-json.log.

Agent is configured with a Custom Log integration to read those logs via Fleet. In the "Advanced config" (I can't remember exact field label right now) I placed that yaml config in my last post. I'm assuming that Custom Logs uses Filebeat for it's actual work since that's where I found the info for the advanced config.

My goal is to find a way to parse the container log messages like they were from non-containerized apps.

So, for my instances of Apache that are not in Docker, I put the logs through the Apache log integration (which I assume is the Filebeat Apache module.) The output breaks the message field up into useful fields that your pre-built dashboards can display and use.

I'd like to break the message field from my container logs up the same way.

If I put MySQL or Tomcat into containers, I'd like to be able do the same thing to their logs.

(When I started this topic I was thinking of something like redirecting the logs through the proper Filebeat module. Hence my title.)

FYI, I'm running the 8.2.3 or 8.3.2. versions of Elastic Stack. I'm not at work right now so I don't recall the exact version... Most of my Docker stuff is on Ubuntu 18.04 or 20.04.

Hopefully that clarifies things.

Thanks that helps.

Unfortunately I am not a swarm expert but I think I get the gist and I am not quite sure yet how Agent handles this yet (asking internally)

They way this works in K8s there are annotations that help direct the logs to the correct module.

When you look at the logs in kibana are there any container annotations / tags / fields or anything you can use "To Sort" the logs on if so then yes we can do some "Work" to get this to work.

Meta could be something like.

  • Load the correct Filebeat Modules (which loads the pipeline etc)
  • We would create a Top Level Pipeline
  • Agent would send to that Top Level Pipeline
  • That pipeline would use when of the fields to direct the log to the correct detailed pipeline
  • Example the Apache Pipeline
  • Then you would git nicely parse.

So back to do you see any fields we could filter on, container tags etc?

You can add labels to the service definition.

You get these by default under container.labels:
(I copied the json, hence the _ instead of .)

"com_docker_swarm_service_id": "< some kind of random string >",
"com_docker_swarm_service_name": "servicename",
"com_docker_swarm_task": "",
"com_docker_swarm_task_id": "< some kind of random string >",
"com_docker_swarm_task_name": "< namespace >_< service name >.< task slot >.< some kind of random string >",
"com_docker_stack_namespace": "< namespace >",
"com_docker_swarm_node_id": "< some kind of random string >"

I was thinking of adding something like "image_type: "httpd" as a way to sort out what logs go where. Like the way you mentioned using annotations.

There's also container.image.name but that is the not the name of the base image, it's the name of the app's image.

Is there a way for me to manually do what you described?

I just found logs-apache.access-1.3.5 in my ingest pipelines. How often would that version number change? Or can I refer to the pipeline without the version number?

I could set up a docker pipeline, and use the pipeline processor to run the logs-apache.access-1.3.5 pipeline. But if the version number changes, I'd have to know about it somehow... Hmm...

It will change with the Agent Version Number I believe... When you update the agent... I believe

Apache does not change that often it is pretty static...

:slight_smile: That is what I described :slight_smile:

You could just create a pipeline that calls other pipelines as shown here...

Your Conditions would be slightly different.

The example is nearly exactly your case

PUT _ingest/pipeline/one-pipeline-to-rule-them-all
{
  "processors": [
    {
      "pipeline": {
        "description": "If 'service.name' is 'apache_httpd', use 'httpd_pipeline'",
        "if": "ctx.service?.name == 'apache_httpd'",
        "name": "httpd_pipeline"
      }
    },
    {
      "pipeline": {
        "description": "If 'service.name' is 'syslog', use 'syslog_pipeline'",
        "if": "ctx.service?.name == 'syslog'",
        "name": "syslog_pipeline"
      }
    },
    {
      "fail": {
        "description": "If 'service.name' is not 'apache_httpd' or 'syslog', return a failure message",
        "if": "ctx.service?.name != 'apache_httpd' && ctx.service?.name != 'syslog'",
        "message": "This pipeline requires service.name to be either `syslog` or `apache_httpd`"
      }
    }
  ]
}

Well, I can report a partial success. After setting up the pipelines like we discussed, my logs are being parsed out into fields properly. That is good. Unfortunately, the built in Apache dashboards are not working.

It looks like they're specifically only looking at Apache datasets. In the case of the [Logs Apache] Access and error logs dashboard the requests of one of the visualizations were set to match on "data_stream.dataset": "apache.error".

Any idea where that match is getting set? I can't find it when I edit the visualization.

Edit: Found it, there's a saved search the visualization is linked to.

pretty sure that actually gets set at the agent level outbound (i.e. it gets set before it gets to the ingest pipelinet) so you may need to add that to your sorting pipeline etc. to set that value.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.