Journalbeat input filtering (journalbeat processing its own logs and then logging that it processed them)?

Howdy. This is in reference to journalbeat 6.6.1

Currently there appears to be 2 ways to filter input.

  1. I can filter by explicitly listing specific journal files and/or directories in paths
  2. I can use include_matches in order to filter to only the matching inputs.

By default I want to include everything and then pare back the stuff I don't care about so I am less likely to miss something. So I am using empty paths to use the default local journal. That gives me everything....including journalbeats

I installed via tarball and created a systemd unit to run journalbeats with the following:

ExecStart=/opt/journalbeat/current/journalbeat run -e \
    --path.config /opt/journalbeat/ \
    --c journalbeat.yml

Everything is working fine. However because I am running within systemd, and using the -e flag, journalbeat itself also logs to journal. So I get a lot of messages in the journal like:

Mar 08 14:15:56 somehostname journalbeat[10995]: 2019-03-08T14:15:56.833Z INFO [input] input/input.go:133 journalbeat successfully published 1 events {"id": "2d121d9b-6458-40ef-a9b5-f642c0218916"}

Of course this was the first and obvious thing to filter out which I did using a processor:

processors:
  - add_host_metadata: ~
  #- add_cloud_metadata: ~
  - drop_event:
      when:
        equals:
          process.name: "journalbeat"

This works as expected and journalbeat messages no longer get sent to logstash.

However, because this processing happens after the input is accepted, journalbeat still writes out that "journalbeat successfully published 1 events"...which goes to the journal...which journal beats reads and then prints "journalbeat successfully published 1 events" due to the lack of input filtering.

I can work around this with a couple approaches:

  1. Don't let journalbeat's systemd unit log to journal.
  2. Use include_matches to filter the input messages to the things I want messages from.

I am guessing #1 is the best option and hopefully just dropping the -e argument is enough. If I have problems with journalbeat I just have to go find the traditional log file to dig through.

Would be quite handy if there was an exclude_matches filter on the inputs so I could just use journald since "that is the future" (or so they say) and exclude journalbeat at input but take everything else (to start).

Did I miss something or is this an accurate assessment? Any other thoughts or advice? Thanks!

Journalbeat sending its own logs over and over is something we would like to avoid. Do you mind opening an issue on Github?
In the meantime you could try overriding the unit file with the help of this document: https://www.elastic.co/guide/en/beats/journalbeat/master/running-with-systemd.html#_customize_systemd_unit_for_journalbeat

Would be quite handy if there was an exclude_matches filter on the inputs so I could just use journald since "that is the future" (or so they say) and exclude journalbeat at input but take everything else (to start).

What do you mean here? Would you like to have a new filter since which reads entries from a given date?

Thanks for the reply. I am using my own systemd unit file and already removed the -e flag and commented out StandardOutput=journal and StandardError=journal which does indeed work as a work-around. journalbeat no longer logs to journal and logs it its own logging directory (in my case /opt/journalbeat/current/logs).

What I meant by:

Would be quite handy if there was an exclude_matches filter on the inputs so I could just use journald since "that is the future" (or so they say) and exclude journalbeat at input but take everything else (to start).

I was suggesting that having an exclude "input filter" for journalbeat would be useful. Journalbeat already has include_matches directive on input. If you only want to have journalbeat pick up a couple things, that is fine. However if you go at it the other way: include all journal entries but want to exclude a few noisy or unimportant services...having a exclude_matches option would be nice.

I was assuming that if journalbeat was logging to journal and I could tell journalbeat to exclude logs from the journalbeat unit...that would keep these circular things from happening.

My comment about it being the future was just that it seems we are moving away from syslog and application specific logs (at least the major linux distros)...so using journal for everything keeps things simpler. ie it would be nice to have journalbeat also log to journal.

I am not sure if there is an elegant solution to avoiding this circular reference issue. I mean on one hand you may want to know if journalbeat is having errors and have those shipped if possible. On the other hand, having journalbeat log everytime it ships a log entry and then having that journalbeat log entry about shipping a log entry shipped is not ideal...so having an easy way (default way?) of filtering out informational journalbeat logs from the journal input would be great.

Configuring journalbeat to not log to the journal is the easiest work around at the moment.

I'll open a github ticket. Started here because I wasn't sure if this was a bug or just a problem between the chair and the keyboard.

Issue created: https://github.com/elastic/beats/issues/11179

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.