Need to include a certain field from my Log into the Message part of my log without pulling the entire log in

Hello,

I am currently using security onion for pulling in zeek logs to our SIEM, and they are very particular about the JSON formatting of the data to have it show up properly parsed in the SIEM.

For my logstash pipeline custom output config, we would normally have our output set as so:

output {
  if [event][module] == "zeek" {
    tcp {
      id => "cloned_events_out"
      host => "192.168.x.x"
      port => 1514
      codec => "json_lines"
    }
  }
}

However, due to the SIEM not liking all of the additional info that gets tacked onto the logs other than just the zeek log message itself, I have had to modify it to send the logs like so:

output {
  if [event][module] == "zeek" {
    tcp {
      id => "cloned_events_out"
      host => "192.168.x.x"
      port => 1514
      codec => line { format => "%{pipeline} - %{message}"}
    }
  }
}

This has worked great for getting logs parsed in our SIEM, but now we are lacking some information that I would like to be in the log, specifically the information inside of %{host}, which includes the name of the security onion sensor that observed the zeek traffic.

This data is included in the whole log, but not the log %{message} component of the logs. I am wondering how it would be possible to pull that %{host} field from the log and include it at the end of the %{message} part of the log.

My original try was to just modify the codec line like so:

codec => line { format => "%{pipeline} - %{message} %{host}"}

but then it messed up the JSON formatting and contains a bunch of info I don't need. (Green box is the expected, message part of the log, the red is all the extra stuff that comes along with the host part of the log that I don't need the majority of...)

I really am only interested in having a line at the end of the zeek message itself that says:

observer: <name>

Unfortunately I have not had to do very much with logstash since all of the elastic stuff comes prebuilt into security onion out of the box, so apologies for being a newbie to elastic. I appreciate any guidance or tips on how to achieve what I am setting out to do! Thank you in advance!

What is exactly the output you expect and how your message fields looks like? Is it a Json?

Can you add a file or stdout output into the logstash pipeline and capture the raw output?

host is an object that includes several other fields. You could try

codec => line { format => "%{pipeline} - %{message} %{[host][name]}"}

or similar to pull out the sub-field that you want.

Yes, this is JSON.

Here is an example raw output that includes ALL fields (changing some of the values for privacy)

{
  "event": {
    "module": "zeek",
    "dataset": "zeek",
    "category": "network"
  },
  "data_stream": {
    "namespace": "so",
    "type": "logs",
    "dataset": "zeek"
  },
  "container": {
    "id": "kerberos.log"
  },
  "ecs": {
    "version": "8.0.0"
  },
  "elastic_agent": {
    "id": "abc123",
    "version": "8.14.3",
    "snapshot": false
  },
  "tags": [
    "elastic-agent",
    "input-testing",
    "beats_input_codec_plain_applied"
  ],
  "agent": {
    "id": "123",
    "ephemeral_id": "123",
    "type": "filebeat",
    "name": "Sensor123",
    "version": "8.0.0"
  },
  "log": {
    "offset": 22731362,
    "file": {
      "path": "/nsm/zeek/logs/current/kerberos.log"
    }
  },
  "message": {
    "ts": 1737168396.6641,
    "uid": "abc123",
    "id.orig_h": "192.168.1.2",
    "id.orig_p": 12345,
    "id.resp_h": "192.168.1.3",
    "id.resp_p": 456
 },
  "pipeline": "kerberos",
  "host": {
    "name": "Sensor123",
    "containerized": false,
    "id": "123",
    "ip": [
      "192.168.1.5"
    ],
    "architecture": "x86_64",
    "os": {
      "family": "linux",
      "name": "Linux server",
      "kernel": "linux",
      "type": "linux",
      "platform": "linux",
      "version": "9"
    },
    "mac": [
      "00-00-00-00-00"
    ],
    "hostname": "Sensor123"
  },
  "metadata": {
    "type": "_doc",
    "version": "8.14.3",
    "input": {
      "beats": {
        "host": {
          "ip": "192.168.1.12"
        }
      }
    },
    "input_id": "logfile-logs-zeek-logs",
    "raw_index": "logs-zeek-so",
    "pipeline": "zeek.kerberos",
    "beat": "filebeat",
    "stream_id": "logfile-log.logs-zeek-logs"
  },
  "@timestamp": "2025-01-17T19:44:53.568Z",
  "input": {
    "type": "log"
  },
  "@version": "1"
}

The filter is setup to capture the type of pipeline, which in this example would be "kerberos" and then the message would be after it so it looks like this:

kerberos - 
{
    "ts": 1737168396.6641,
    "uid": "abc123",
    "id.orig_h": "192.168.1.2",
    "id.orig_p": 12345,
    "id.resp_h": "192.168.1.3",
    "id.resp_p": 456
 }

So now, I want to be able to add the "hostname" or the agent name fields into the log as "observer.name", preferably within the message field itself, so it would look something like:

kerberos - 
{
    "ts": 1737168396.6641,
    "uid": "abc123",
    "id.orig_h": "192.168.1.2",
    "id.orig_p": 12345,
    "id.resp_h": "192.168.1.3",
    "id.resp_p": 456,
    "observer.name": "Sensor123"
 }

If this might be too much effort, I think it may also be ok to also just tack it on as its own section after the message, but not sure how this will be reflected with parsing in the SIEM:

kerberos - 
{
    "ts": 1737168396.6641,
    "uid": "abc123",
    "id.orig_h": "192.168.1.2",
    "id.orig_p": 12345,
    "id.resp_h": "192.168.1.3",
    "id.resp_p": 456
}
{
     "observer.name": "Sensor123"
}

I tried Badger's solution and was able to get the hostname in there, but it just shows as the value at the end and not the key and is not in any kind of format:

kerberos - 
{
    "ts": 1737168396.6641,
    "uid": "abc123",
    "id.orig_h": "192.168.1.2",
    "id.orig_p": 12345,
    "id.resp_h": "192.168.1.3",
    "id.resp_p": 456
}
  Sensor123

So, I feel like we are getting close, just needing to get the formatting right and figure out how to get it into the message part of the log instead, if possible.

Thank you all for the help on this! Hope this makes sense.

OK, how about something like

codec => line { format => '%{pipeline} - %{message} { "hostname": %{[host][name]} }' }

You will need to experiment, since it is unclear exactly what format you want.

2 Likes

Since your message field is a json you could try something like this in your pipeline:

filter {
    mutate {
        add_field => {
            "[message][observer.name]" => "%{[host][name]}"
        }
    }
}

This would add a field named observer.name inside your message field, then your output would be just this: codec => line { format => "%{pipeline} - %{message}"}

1 Like