I have multiple service logs being shipped from every host to logstash using rsyslog.
# mssg log
auth,authpriv.* @logsashserver:5548
kern.* @logsashserver:5548
mail.* @logsashserver:5548
after being ingested by logstash and shipped to elasticsearch, I would have various logs from different programs. each with different messages. below is an example of a sshd message:
{
"_index": "mssg-2016.09.12",
"_type": "mssg",
"_id": "example",
"_score": null,
"_source": {
"message": "pam_unix(sshd:session): session opened for user bob by (uid=0)",
"@version": "1",
"@timestamp": "2016-09-12T14:42:29.000Z",
"type": "mssg",
"host": "123.123.123.123",
"priority": 86,
"timestamp": "Sep 12 10:42:29",
"logsource": "eample-ls",
"program": "sshd",
"pid": "113222",
"severity": 6,
"facility": 10,
"facility_label": "security/authorization",
"severity_label": "Informational"
}
}
What I want to do, is when a message matches:
^(pam_unix.*)(session opened for user )(\w+)(.*)
I want to add a field called sshd_user
, with the value of the 3rd capture group (the user name). So I can build a unique list of users logging in to various systems. I have recently been trying to wrap my head around using Analyzers and Tokenizers, but I am not sure what to do / or even if that is the correct approach. Should I be dealing with this on the logstash level? if this can be done using Analyzers, how?