What does the field in Grok "message" do?

Hi first post and new to the ELK system. So I am working on parsing through log4j logs and I cannot find what is the purpose of the "message" field. For example:

filter {
  grok {
    match => {"message =>

I am so confused on if it is just a naming convention or standard. This is probably a really basic question but I need to find out before moving on. Is message just something I can reference back to if I need to?

Thanks.
Mike

3 Likes

The message field is like a default field. It's where most input plugins place the payload that they receive from the network, read from a file, or whatever. So no, it's not just a convention.

In many log formats the message field starts with a timestamp, maybe a severity level, possibly a hostname, and so on, and ends with the actual message. In such cases one typically extract the timestamp etc into fields of their own and remove them from the message field. In other cases like HTTP logs there is no free-text message.

Oh alright so the message will hold the bulk of default fields. Does this mean that after I write my message field to store say the timestamp, level, groupId...etc underneath I would filter out more of the log message. i.e

filter {
  grok{
    match => {"message" => {SYSLOGTIMESTAMP:installTime}
    match => {"someotherfield" => {more grok or regex}

Thanks for help!

I'm not following, but let's take syslog messages as an example. Logstash ships with some grok patterns for syslog messages, like SYSLOGBASE which is defined like this:

$ grep SYSLOGBASE /opt/logstash/patterns/grok-patterns
SYSLOGBASE %{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}:

When used like

filter {
  grok {
    match => ["message", "%{SYSLOGBASE}"]
  }
}

Logstash will attempt the match and extract the fields timestamp, program, pid, logsource, facility, priority and a few others from the message field. Then we could end up with a message looking like this:

{
  "logsource": "somehostname",
  "timestamp": "Jun  1 07:50:01",
  "program": "CRON",
  "pid": "22912",
  "message": "Jun  1 07:46:01 somehostname CRON[22912]: (root) CMD ( /path/to/some/program > /dev/null 2>&1)"
}

By changing the filter to

filter {
  grok {
    match => ["message", "%{SYSLOGBASE} %{GREEDYDATA:message}"]
    overwrite => ["message"]
  }
}

we capture the actual message part of the original message and save it back into the message field, yielding:

{
  "logsource": "somehostname",
  "timestamp": "Jun  1 07:50:01",
  "program": "CRON",
  "pid": "22912",
  "message": "(root) CMD ( /path/to/some/program > /dev/null 2>&1)"
}

Now things are starting to look useful.

2 Likes

I think I am starting to understand more now. So the SYSLOGBASE from your example picks up most of the overhead that comes with the log. Message takes everything afterwards and from your example you want message to be "(root) CMD..." Does this mean I can call message anything else i.e. data or msg?

Sure. There might be specific exceptions for some output plugins, but what you call the output fields is generally up to you.