Logstash to Logstash (Lumberjack) host details

I am setting up a two-tier Logstash setup for our customers.

On-premises servers running syslog and beats will connect to 1 or more on-premises Logstash servers, which will be running syslog and beats pipelines configured using the lumberjack output plugin to forward the events to cloud-based Logstash servers hosted in AWS.

I have this working and both syslog and beat events are arriving at the AWS hosted Logstash servers and being written out to S3 as desired.

However, I would like to add fields to the events as they pass through the on-premises Logstash servers to record the details of the Logstash server they pass through, the equivalent of an X-Forwarded-For header a HTTP proxy server adds.

I am unable to find any variables/fields that contain details of the Logstash server itself.

The environment variable $HOSTNAME contains the hostname of the Logstash server but it would be good to get its FQDN and IP address included also.

Anyone done anything like this before?
Anyone have any ideas?

X-Forwarded-For? The beats will add their hostname to the messages they forward, and the syslog input should be adding a [host] field that contains the ip address the message was received from.

If you want the name of the server running logstash you could try something like

ruby {
    init => "require 'socket'"
    code => "event.set('some field', Socket.gethostname)"
}

My current solution is to use environment variables but its rather inelegant.

Logstash is actually running inside a Docker container, so through Docker Compose I am able to pass environment variables from a .env file.

This .env is generated by a Pre-script run by systemd before the Docker Compose is started.

In the Pre-script I have

echo "HOST_HOSTNAME `hostname`" >> .env
echo "HOST_FQDN `hostname -f`" >> .env
echo "HOST_IP_ADDRESSES `hostname -I`" >> .env

NOTE: The Pre-script already existed as its brings in some secrets from AWS Systems Manager Parameter Store.
NOTE: There is also a Post-script to remove .env.

Then I can access those environment variables in a series of mutates, including a split() on $HOST_IP_ADDRESSES.

So the contents of .env

HOST_HOSTNAME=logstash-relay
HOST_FQDN=logstash-relay.shared
HOST_IP_ADDRESSES=10.211.55.42 172.17.0.1 fdb2:2c26:f4e4:0:21c:42ff:feed:db4a

Passed through Docker Compose

    environment:
      - "HOST_HOSTNAME=${HOST_HOSTNAME}"
      - "HOST_FQDN=${HOST_FQDN}"
      - "HOST_IP_ADDRESSES=${HOST_IP_ADDRESSES}"

Then mutated in my syslog pipeline

filter {
  mutate {
    add_field => { "[forwarder][hostname]" => "${HOST_HOSTNAME}" }
    add_field => { "[forwarder][fqdn]" => "${HOST_FQDN}" }
    add_field => { "[forwarder][ip]" => "${HOST_IP_ADDRESSES}" }
  }
  mutate {
    split => { "[forwarder][ip]" => " " }
  }
}

Results in events now looking like this from the syslog pipeline on the forwarder.

{
          "severity" => 6,
             "agent" => {
        "type" => "syslog"
    },
               "pid" => "661",
           "program" => "snapd",
         "logsource" => "zenlinux.shared",
           "message" => "storehelpers.go:551: cannot refresh: snap has no updates available: \"core18\", \"lxd\", \"snapd\"",
          "priority" => 30,
        "@timestamp" => 2020-10-27T20:33:37.000Z,
         "forwarder" => {
        "hostname" => "logstash-relay",
            "fqdn" => "logstash-relay.shared",
              "ip" => [
            [0] "10.211.55.42",
            [1] "172.17.0.1",
            [2] "fdb2:2c26:f4e4:0:21c:42ff:feed:db4a"
        ]
    },
              "host" => "10.211.55.39",
          "@version" => "1",
          "facility" => 3,
    "severity_label" => "Informational",
    "facility_label" => "system",
         "timestamp" => "Oct 27 20:33:37"
}

The host field is left untouched as it contains the IP address of the syslog client machine.

I will also be adding this to beats but need to read up on where to put it in the ECS scheme.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.