Single module for multiple similar logs

We have multiple instances of the same service. Consider, for example:

sudo service my_service_1 start
sudo service my_service_2 start
sudo service my_service_3 start

with log files respectively at:

/var/log/my_service/instance_1/*.log
/var/log/my_service/instance_2/*.log
/var/log/my_service/instance_3/*.log

(to complicate things, we also have logs at /var/log/my_service/*.log)

In each location, the structure of the logs is identical. So, at the very least, I'd like to write a single module (ingestion pipeline, fields, etc.) that ingests all 3 instances but makes them separately searchable in Elasticsearch. For example, if my ingestion pipeline parses GC pauses, I'd like to search and analyze GC pauses for a single instance only (e.g., /var/log/my_service/instance_2/*.log without GC pauses for /var/log/my_service/instance_1/*.log or /var/log/my_service/instance_3/*.log included in the results).

Keep in mind that each instance has multiple types of log files, so there will be multiple filesets for each instance. I can't use filesets as a proxy for instances. The gist of this is that I'd like to have another hierarchical level between module and fileset.

First, if there's a preferred, canonical way to do this, please let me know.

If there isn't, I have a few thoughts about ways to approach this.

  1. derive an "instance" variable from the path of the log file being ingested and label parsed values {module}.{instance}.{fileset}.field instead of {module}.{fileset}.field.
  2. set an "instance" variable in the configuration and have multiple modules (e.g., my_service_instance1, my_service_instance2, etc.) use a single set of ingestion pipelines. To do this, I would configure the pipeline conventionally in my_service_instance1 and refer to it in the manifest for my_service_instance2 -- e.g., ingest_pipeline: ../../my_service_instance1/{{fileset}}/ingest/pipeline.json. Something like that.

Any thoughts are greatly appreciated.

P.S. I'm currently on a deep dive into testing filebeat module development and would be happy to help in your efforts to document this in any way you might find useful. Here's one thought: a huge help was discovering that I can ask the tests to generate expected log files from the test log files.

Hey @Jim_Ivey !

This makes some sense to me, and looks like what we have in Metricbeat when we define different hosts. Pinging @kvch and @steffens who might have better insight into this.

Could you please tell me a bit more about your logs? Maybe share a few examples?

The second approach sounds reasonable. What I suggest (without seeing the logs and the hierarchy) is to create one module with a fileset for each type of log file you have. To distinguish between the instances you can tag the event.
Example:

- module: your-module-name
  your-fileset-name:
    enabled: true
  var.paths:
    - /var/log/my_service/instance_1/gc.log
  input:
    fields:
      name: instance_1

- module: your-module-name
  your-fileset-name:
    enabled: true
  var.paths:
    - /var/log/my_service/instance_2/gc.log
  input:
    fields:
      name: instance_2

This adds the field "name" with the instance name to the event. Then you can filter based on name in Elasticsearch or on the UI in Kibana.

Just keep in mind not to configure the same paths for different inputs/modules.

Regarding the documentation around Filebeat modules, we are happy to take contributions. Unfortunately, some of the information there is outdated.

The company I work for is extremely cautious about security, but I think I can share that my_module is jetty. I couldn't find a jetty module for filebeat, so I'm writing one.

We have multiple instances of jetty running, each of which has at least 4 different types of logs, including request, stderrout, and gc. Ultimately I want to set up time-series-based alerting as described in these books in Elasticsearch and jetty logs are of particular interest to us.

What I'm really trying to avoid is maintaining 3 or more identical pipelines for multiple instances of jetty. For example, our log files will include .../jetty/instance1/request.log and .../jetty/instance2/request.log. I want to write and maintain just one ingestion pipeline with jetty as the module and request as the fileset and somehow inject the instance (e.g., instance1 and instance2) as a level between those two.

For now, I'm just writing separate modules for each instance, e.g., module name = jetty_instance1. So, separate request logs could be searchable as jetty_instance1.request.field. But, my supervisor wants them searchable as jetty.instance1.request.field, which I agree with.

So, my question is two-fold:

  1. How can I maintain a single ingestion pipeline for jetty.instance1.request, jetty.instance2.request, and jetty.instance3.request?
  2. How can I make the ingested logs searchable as jetty.instance1.request, jetty.instance2.request, and jetty.instance3.request?

Many thanks!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.