Can't get host.name from filebeat input by journad mode

huanghaiqing1 · November 11, 2024, 1:27am

Here we have a journald remote server, which holds all uploaded jourand logs from different host. And it's fed to filebeat as input. And in kibana, I only see all the logs with host.name as this log server's hostname. It fact, the field should save every log's hostname from different log client. Below KB last part defines translated field namesfor jourand as: _HOSTNAME -- host.name. So my question is how to reolsve the issue I met in filebeat configurations. From the upload example event, the message is from a client with hostname: autoyast. But in its host.name, it always shows the logserver's hostname: elk. This is not good. Any solution? From posted picture, you can see filebeat doesn't include _HOSTNAME part from journald.

KB: Journald input | Filebeat Reference [8.15] | Elastic

TiagoQueiroz · November 11, 2024, 1:41pm

Hi @huanghaiqing1,

I briefly checked the code and it should do the correct translation _HOSTNAME -> host.hostname, I wonder if something else is overwriting it.

Filebeat's default configuration file includes the add_host_metadata that also sets host.name.

Could you post your whole Filebeat configuration (please redact any secrets/sensitive information, like credentials, IPs, etc).

If you're using any of the add_*_metadata processors, could you try without them and see whether it solves the problem?

huanghaiqing1 · November 12, 2024, 12:58am

Hello, from filebeat official document, _HOSTNAME maps with host.name. And I also check the field: host.hostname, also is set as the elastic server's hostname. You can read below screen copies for our real status. And I can see filebeat input section just truncate the _HOSTNAME part from our journal logs. Our journald log file: /var/log/journal/remote/remote-127.0.0.1.journal includes all logs received from different client os.

huanghaiqing1 · November 12, 2024, 1:07am

[root@elk remote]# cat /etc/filebeat/filebeat.yml
###################### Filebeat Configuration Example #########################

# This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html

# For more available modules and options, please see the filebeat.reference.yml sample
# configuration file.

# ============================== Filebeat inputs ===============================
filebeat.inputs:
- type: journald
  id: everything
  enabled: true
  paths:
    - /var/log/journal/remote/remote-127.0.0.1.journal
    #- /var/log/journal/remote/
  seek: cursor
  cursor_seek_fallback: since
  since: -24h

# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input-specific configurations.

# filestream is an input for collecting log messages from files.
- type: filestream

  # Unique ID among all inputs, an ID is required.
  id: syslog

  # Change to true to enable this input configuration.
  enabled: false

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /var/log/*.log
    #- c:\programdata\elasticsearch\logs\*

  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  # Line filtering happens after the parsers pipeline. If you would like to filter lines
  # before parsers, use include_message parser.
  #exclude_lines: ['^DBG']

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  # Line filtering happens after the parsers pipeline. If you would like to filter lines
  # before parsers, use include_message parser.
  #include_lines: ['^ERR', '^WARN']

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #prospector.scanner.exclude_files: ['.gz$']

  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  #fields:
  #  level: debug
  #  review: 1

# ============================== Filebeat modules ==============================

filebeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml

  # Set to true to enable config reloading
  reload.enabled: false

  # Period on which files under path should be checked for changes
  #reload.period: 10s

# ======================= Elasticsearch template setting =======================

setup.template.settings:
  index.number_of_shards: 1
  index.number_of_replicas: 1
  #index.codec: best_compression
  #_source.enabled: false

#setup.template.fields: "fields.yml"
setup.template.overwrite: false

#setup.template.name: "huang"
#setup.template.pattern: "huang-*"


# ================================== General ===================================

# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:

# The tags of the shipper are included in their field with each
# transaction published.
#tags: ["service-X", "web-tier"]

# Optional fields that you can specify to add additional information to the
# output.
fields:
  env: HuangHaiqing

# ================================= Dashboards =================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here or by using the `setup` command.
setup.dashboards.enabled: false

# The URL from where to download the dashboard archive. By default, this URL
# has a value that is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:

# =================================== Kibana ===================================

# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:

  # Kibana Host
  # Scheme and port can be left out and will be set to the default (http and 5601)
  # In case you specify and additional path, the scheme is required: http://localhost:5601/path
  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
  #host: "localhost:5601"
  host: "https://elk:5601"
  protocol: "https"
  username: "elastic"
  password: "xxxxx"
  ssl.verification_mode: none

  # Kibana Space ID
  # ID of the Kibana Space into which the dashboards should be loaded. By default,
  # the Default Space will be used.
  #space.id:

# =============================== Elastic Cloud ================================

# These settings simplify using Filebeat with the Elastic Cloud (https://cloud.elastic.co/).

# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.
#cloud.id:

# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.
#cloud.auth:

# ================================== Outputs ===================================

# Configure what output to use when sending the data collected by the beat.

# ---------------------------- Elasticsearch Output ----------------------------
output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["localhost:9200"]
  pipeline: journald-pipeline
  pipeline: hhq-pipeline
  # Performance preset - one of "balanced", "throughput", "scale",
  # "latency", or "custom".
  preset: balanced

  # Protocol - either `http` (default) or `https`.
  protocol: "https"
  pipeline: "hhq_pipeline"
  #index: "huang-%{[fields.log_type]}-%{[agent.version]}-%{+yyyy.MM.dd}"

  # Authentication credentials - either API key or username/password.
  #api_key: "id:api_key"
  username: "elastic"
  password: "xxxxx"
  ssl:
    enabled: true
    ca_trusted_fingerprint: "203C304B8E75B82CE2A9EF454306CAB6E86E038390DC75AFF3EB94961CB39A18"

# ------------------------------ Logstash Output -------------------------------
#output.logstash:
  # The Logstash hosts
  #hosts: ["localhost:5044"]

  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

# ================================= Processors =================================
processors:
  - add_host_metadata:
      when.not.contains.tags: forwarded
  #- add_cloud_metadata: ~
  #- add_docker_metadata: ~
  #- add_kubernetes_metadata: ~

# ================================== Logging ===================================

# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
logging.level: debug

# At debug level, you can selectively enable logging only for some components.
# To enable all selectors, use ["*"]. Examples of other selectors are "beat",
# "publisher", "service".
#logging.selectors: ["*"]

# ============================= X-Pack Monitoring ==============================
# Filebeat can export internal metrics to a central Elasticsearch monitoring
# cluster.  This requires xpack monitoring to be enabled in Elasticsearch.  The
# reporting is disabled by default.

# Set to true to enable the monitoring reporter.
#monitoring.enabled: false

# Sets the UUID of the Elasticsearch cluster under which monitoring data for this
# Filebeat instance will appear in the Stack Monitoring UI. If output.elasticsearch
# is enabled, the UUID is derived from the Elasticsearch cluster referenced by output.elasticsearch.
#monitoring.cluster_uuid:

# Uncomment to send the metrics to Elasticsearch. Most settings from the
# Elasticsearch outputs are accepted here as well.
# Note that the settings should point to your Elasticsearch *monitoring* cluster.
# Any setting that is not set is automatically inherited from the Elasticsearch
# output configuration, so if you have the Elasticsearch output configured such
# that it is pointing to your Elasticsearch monitoring cluster, you can simply
# uncomment the following line.
#monitoring.elasticsearch:

# ============================== Instrumentation ===============================

# Instrumentation support for the filebeat.
#instrumentation:
    # Set to true to enable instrumentation of filebeat.
    #enabled: false

    # Environment in which filebeat is running on (eg: staging, production, etc.)
    #environment: ""

    # APM Server hosts to report instrumentation results to.
    #hosts:
    #  - http://localhost:8200

    # API Key for the APM Server(s).
    # If api_key is set then secret_token will be ignored.
    #api_key:

    # Secret token for the APM Server(s).
    #secret_token:


# ================================= Migration ==================================

# This allows to enable 6.7 migration aliases
#migration.6_to_7.enabled: true
setup.ilm.overwrite: true

huanghaiqing1 · November 12, 2024, 1:39am

From elastic server side's journald log, you can see filebeat process handles the input raw material of journald log without "_HOSTNAME" field.

Nov 12 09:34:05 elk filebeat[9866]: {"log.level":"debug","@timestamp":"2024-11-12T09:34:05.071+0800","log.logger":"processors","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/publisher/processing.debugPrintProcessor.func1","file.name":"processing/processors.go","file.line":213},"message":"Publish event: {\n  \"@timestamp\": \"2024-11-12T01:11:43.406Z\",\n  \"@metadata\": {\n    \"beat\": \"filebeat\",\n    \"type\": \"_doc\",\n    \"version\": \"8.13.4\"\n  },\n  \"message\": \"autoyast -- 432\",\n  \"input\": {\n    \"type\": \"journald\"\n  },\n  \"user\": {\n    \"id\": \"0\",\n    \"group\": {\n      \"id\": \"0\"\n    }\n  },\n  \"syslog\": {\n    \"priority\": 5,\n    \"facility\": 1,\n    \"identifier\": \"test\"\n  },\n  \"event\": {\n    \"kind\": \"event\",\n    \"created\": \"2024-11-12T01:34:05.071Z\"\n  },\n  \"systemd\": {\n    \"transport\": \"syslog\"\n  },\n  \"agent\": {\n    \"type\": \"filebeat\",\n    \"version\": \"8.13.4\",\n    \"ephemeral_id\": \"07c4d050-3271-45f6-99c8-f6c63a450919\",\n    \"id\": \"df87fd63-3f5a-40d7-b76a-f6cae44d7d49\",\n    \"name\": \"elk\"\n  },\n  \"ecs\": {\n    \"version\": \"8.0.0\"\n  },\n  \"journald\": {\n    \"host\": {\n      \"boot_id\": \"9d993af9d8344305b7b1ea365d8aef7c\"\n    },\n    \"pid\": 1413,\n    \"custom\": {\n      \"runtime_scope\": \"system\",\n      \"syslog_timestamp\": \"Nov 12 09:11:43 \"\n    },\n    \"uid\": 0,\n    \"gid\": 0\n  },\n  \"log\": {\n    \"syslog\": {\n      \"facility\": {\n        \"code\": 1\n      },\n      \"priority\": 5\n    }\n  },\n  \"fields\": {\n    \"env\": \"HuangHaiqing\"\n  },\n  \"host\": {\n    \"id\": \"02102896377942289eff1e65ceb1fdff\",\n    \"name\": \"elk\",\n    \"architecture\": \"x86_64\",\n    \"os\": {\n      \"type\": \"linux\",\n      \"platform\": \"rocky\",\n      \"version\": \"8.9 (Green Obsidian)\",\n      \"family\": \"redhat\",\n      \"name\": \"Rocky Linux\",\n      \"kernel\": \"4.18.0-513.24.1.el8_9.x86_64\",\n      \"codename\": \"Green Obsidian\"\n    },\n    \"containerized\": false,\n    \"ip\": [\n      \"192.168.8.104\",\n      \"192.168.31.104\"\n    ],\n    \"mac\": [\n      \"00-50-56-27-27-48\",\n      \"00-50-56-2E-6D-8C\"\n    ],\n    \"hostname\": \"elk\"\n  },\n  \"process\": {\n    \"pid\": 1413\n  }\n}","service.name":"filebeat","ecs.version":"1.6.0"}

TiagoQueiroz · November 12, 2024, 4:00pm

Hi @huanghaiqing1,

Thanks for sharing your config and the reply. As I suspected the add_host_metadata processor is enabled and overwriting the host.hostname set by the journald input.

To preserve the host.hostname set by the journald input there are two options:

Remove the add_host_metadata processor completely.

Add the forwarded tag to the Journald input:

- type: journald
  id: everything
  enabled: true
  paths:
    - /var/log/journal/remote/remote-127.0.0.1.journal
  seek: cursor
  cursor_seek_fallback: since
  since: -24h
  tags:
    - forwarded

If you look closely at the configuration you have for the global processors (comments removed):

processors:
  - add_host_metadata:
      when.not.contains.tags: forwarded

The add_host_metadata will not run when the event contains the tag forwarded. You can customise it to be other tags or even other event fields. Look at the conditions documentation for more details.

huanghaiqing1 · November 13, 2024, 2:09am

Yes. it woks. Very thanks for your professional support. Just one additonal question: what's difference of host.hostname and host.name. From official filebeat document about Translated Field names[Journald input | Filebeat Reference [8.16] | Elastic], _HOSTNAME should map to host.name. But in real status, host.name shows filebeat server's hostname, and host.hostname map with _HOSTNAME from remote journald logs' hostnames. So I'm not sure it's a typo or there is something else to be configured.

TiagoQueiroz · November 13, 2024, 7:36pm

Thanks for spotting the issue in the documentation, indeed the documentation and our implementation are different

I created a GitHub issue to track it: Journald documentation and implementation diverge about the translation of `_HOSTNAME` to ECS · Issue #41635 · elastic/beats · GitHub.

Regarding the difference between host.hostname and host.name, I'll get back to you once I have a clear differentiation.

TiagoQueiroz · November 13, 2024, 8:33pm

ECS defines them as:

host.hostname: Hostname of the host. It normally contains what the hostname command returns on the host machine.
host.name: Name of the host. It can contain what hostname returns on Unix systems, the fully qualified domain name (FQDN), or a name specified by the user. The recommended value is the lowercase FQDN of the host.

You can think of host.name as the FQDN of the machine.

Regardless, when Filebeat is forwarding events from other hosts it should not overwrite those fields from the event with the values from the host it's running on.

In Filebeat, well Beats in general, host.name is set by the publishing pipeline, but it can be disabled by setting publisher_pipeline.disable_host: true.

huanghaiqing1 · November 15, 2024, 7:41am

That's very clear. Thanks and I think we can close the case.

Topic		Replies	Views
Filebeat adds a host.name field to JSON messages even when already set Beats filebeat	1	1782	August 13, 2019
How to add hostname to logs that normally do not contain hostname? Beats	14	12983	July 19, 2016
Wrong filebeat hostname in logs Beats filebeat	5	4218	September 26, 2019
Missing beat.hostname from Filebeat Index Pattern Beats filebeat	18	5537	August 16, 2018
Logstash errors after upgrading to filebeat-6.3.0 Beats filebeat	29	16006	July 31, 2018

Can't get host.name from filebeat input by journad mode

Related topics