I've successfully configured Elastic Agent to grab my Docker container logs and send them to an ingest pipeline for processing. I'm using the JSON parser to get my fields and timestamp set up.
But now I'm stuck on how to take logs from, say, an httpd container, and get them parsed out into fields like http.request.method and user_agent.name that I get from my non-dockerized httpd instances via Elastic-Agent's Apache integration. (I assume that's just code for Filebeat's apache module."
Is there an easy way? Or do I need to configure a grok parser and do it all myself?
Sorry, I should have been more detailed in my original post.
I've configured a custom log integration (per this issue) to grab the logs from /var/lib/docker/containers/*/*-json.log. I added this to the custom config section:
type: container
json.expand_keys: true
So that elastic-agent will tell Filebeat to treat the logs as container logs.
Unless I'm mistaken, adding the Apache integration would only work if it could read the Apache logs from the local filesystem at /var/log/apache2|httpd. Right? It can't just intercept the output of the custom logs integration and figure out which containers are sending apache logs and which are not.
And that brings me to my question. Is there an easy way to take the httpd container logs and get them parsed out into fields the same way the Apache Filebeat module would?
I put a label on my httpd service config to give me something to check and find out what kind of container I'm running. So if I can key off that and send the logs through some process, I'd be good.
I'm just not sure what process to send them through. An ingest pipeline and a grok parser? Or is there a way to use Filebeats apache module directly?
Agent is not in Docker. It's running as a normal service. Installed per the usual method given from the Fleet screen in Kibana.
My containers are either running in standalone Docker, or Docker Swarm.
Apache is in Docker. Specifically containers based on the php:< version >-apache image. So logs go into /var/lib/docker/containers/*/*-json.log.
Agent is configured with a Custom Log integration to read those logs via Fleet. In the "Advanced config" (I can't remember exact field label right now) I placed that yaml config in my last post. I'm assuming that Custom Logs uses Filebeat for it's actual work since that's where I found the info for the advanced config.
My goal is to find a way to parse the container log messages like they were from non-containerized apps.
So, for my instances of Apache that are not in Docker, I put the logs through the Apache log integration (which I assume is the Filebeat Apache module.) The output breaks the message field up into useful fields that your pre-built dashboards can display and use.
I'd like to break the message field from my container logs up the same way.
If I put MySQL or Tomcat into containers, I'd like to be able do the same thing to their logs.
(When I started this topic I was thinking of something like redirecting the logs through the proper Filebeat module. Hence my title.)
FYI, I'm running the 8.2.3 or 8.3.2. versions of Elastic Stack. I'm not at work right now so I don't recall the exact version... Most of my Docker stuff is on Ubuntu 18.04 or 20.04.
Unfortunately I am not a swarm expert but I think I get the gist and I am not quite sure yet how Agent handles this yet (asking internally)
They way this works in K8s there are annotations that help direct the logs to the correct module.
When you look at the logs in kibana are there any container annotations / tags / fields or anything you can use "To Sort" the logs on if so then yes we can do some "Work" to get this to work.
Meta could be something like.
Load the correct Filebeat Modules (which loads the pipeline etc)
We would create a Top Level Pipeline
Agent would send to that Top Level Pipeline
That pipeline would use when of the fields to direct the log to the correct detailed pipeline
Example the Apache Pipeline
Then you would git nicely parse.
So back to do you see any fields we could filter on, container tags etc?
You get these by default under container.labels:
(I copied the json, hence the _ instead of .)
"com_docker_swarm_service_id": "< some kind of random string >",
"com_docker_swarm_service_name": "servicename",
"com_docker_swarm_task": "",
"com_docker_swarm_task_id": "< some kind of random string >",
"com_docker_swarm_task_name": "< namespace >_< service name >.< task slot >.< some kind of random string >",
"com_docker_stack_namespace": "< namespace >",
"com_docker_swarm_node_id": "< some kind of random string >"
I was thinking of adding something like "image_type: "httpd" as a way to sort out what logs go where. Like the way you mentioned using annotations.
There's also container.image.name but that is the not the name of the base image, it's the name of the app's image.
Is there a way for me to manually do what you described?
I just found logs-apache.access-1.3.5 in my ingest pipelines. How often would that version number change? Or can I refer to the pipeline without the version number?
I could set up a docker pipeline, and use the pipeline processor to run the logs-apache.access-1.3.5 pipeline. But if the version number changes, I'd have to know about it somehow... Hmm...
Well, I can report a partial success. After setting up the pipelines like we discussed, my logs are being parsed out into fields properly. That is good. Unfortunately, the built in Apache dashboards are not working.
It looks like they're specifically only looking at Apache datasets. In the case of the [Logs Apache] Access and error logs dashboard the requests of one of the visualizations were set to match on "data_stream.dataset": "apache.error".
Any idea where that match is getting set? I can't find it when I edit the visualization.
Edit: Found it, there's a saved search the visualization is linked to.
pretty sure that actually gets set at the agent level outbound (i.e. it gets set before it gets to the ingest pipelinet) so you may need to add that to your sorting pipeline etc. to set that value.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.