Creating fields on special case log messages

I have multiple service logs being shipped from every host to logstash using rsyslog.

# mssg log
auth,authpriv.*			 @logsashserver:5548
kern.*				 @logsashserver:5548
mail.*				 @logsashserver:5548

after being ingested by logstash and shipped to elasticsearch, I would have various logs from different programs. each with different messages. below is an example of a sshd message:

{
  "_index": "mssg-2016.09.12",
  "_type": "mssg",
  "_id": "example",
  "_score": null,
  "_source": {
    "message": "pam_unix(sshd:session): session opened for user bob by (uid=0)",
    "@version": "1",
    "@timestamp": "2016-09-12T14:42:29.000Z",
    "type": "mssg",
    "host": "123.123.123.123",
    "priority": 86,
    "timestamp": "Sep 12 10:42:29",
    "logsource": "eample-ls",
    "program": "sshd",
    "pid": "113222",
    "severity": 6,
    "facility": 10,
    "facility_label": "security/authorization",
    "severity_label": "Informational"
  }
}

What I want to do, is when a message matches:
^(pam_unix.*)(session opened for user )(\w+)(.*)

I want to add a field called sshd_user, with the value of the 3rd capture group (the user name). So I can build a unique list of users logging in to various systems. I have recently been trying to wrap my head around using Analyzers and Tokenizers, but I am not sure what to do / or even if that is the correct approach. Should I be dealing with this on the logstash level? if this can be done using Analyzers, how?

You might be able to do this with an analyzer but doing it upon ingestion (either with an ingest node or with Logstash) is the better option.

1 Like