I've been facing a problem related to multiline events lately, and I am needing a little bit of your help for this. My syslog server is sending multi-line events. One single event gathers several lines, and the indicator that proves a particular event line is part of a multi-line event is a random number that defines a user connection session. Here is a custom generated log file:
Feb 16 17:29:04 slot1/APM-LTM notice apd[5515]: 01490010:5: 1ec2b273: Username 'cjones'
Feb 16 17:29:04 slot1/APM-LTM warning apd[5515]: 01490106:4: 1ec2b273: AD module: authentication with 'cjones' failed: Preauthentication failed, principal name: cjones@GEEKO.COM. Invalid user credentials. (-1765328360)
Feb 16 17:10:04 slot1/APM-LTM notice apd[5515]: 01490010:5: d8b5a591: Username 'gbridget'
Feb 16 17:10:04 slot1/APM-LTM err apd[5515]: 01490107:3: d8b5a591: AD module: authentication with 'gbridget' failed: Clients credentials have been revoked, principal name: gbridget@GEEKO.COM. User account is locked (-1765328366)
Feb 16 17:29:04 slot1/APM-LTM notice apd[5515]: 01490005:5: 1ec2b273: Following rule 'fallback' from item 'AD Auth' to ending 'Deny'
Feb 16 17:29:04 slot1/APM-LTM notice apd[5515]: 01490102:5: 1ec2b273: Access policy result: Logon_Deny
Above are the lines related to two different connections defined by the following user sessions: d8b5a591(user gbridget) and 1ec2b273(user cjones). user sessions are the only indicators that connect those lines to two different events. not to mention that the line events are interlaced.
The problem is that I am at loss as how to explain the above to grok filter with a multiline plugin, knowing that the latter offers too few options. In fact , the notion of "previous" and "next" line cannot be applied here for instance, so the grok options "pattern" and "what" cannot be used, since the events are not necessarily consecutive.
I would really appreciate it if someone could shed some light on this and tell me if at least it is feasable or not.
Do you really need to join these lines into a single event? To me they're separate but related events and I wouldn't attempt to merge them. If you insist, have a look at the collate and aggregate filters.
Are there other alternatives? I actually have no idea what else I could do to perform what I've got in mind. I'd like to gather the lines related to a particular connection into one single event, hence, one single document, so that I could query the latter to fetch all I need about the related connection, and visualize it via KIBANA.
It would be something like this: logstash.jira.com/browse/LOGSTASH-1785, but rather than dealing with consecutive multiline events, one'd deal with interlaced events.
I have set the number of threads to multiple and can't measure the amount of time an event can last so neither of the filters you mentioned can work in my environment And I've been told that ES documents cannot be joined in order to be queried, I am really stuck. Any clue for this?
I've got exactly the same question. Can't believe this isn't a common issue.
For me I got exception messages within my log file including multiline
stack traces. So it's not suitable to have one event for each line of my stack
trace.
The solution would be to set a field, e.g. @timestamp, as a delimiter since
this is the real separator for the log entries but this seems not to work at
all?
My delimiter is the dynamic id. @timestamp can't be a delimiter since each line of log has its own timestamp. The last log line of a particular event may come 2 hours after the first one, and meanwhile other events may occur.
But I have a number of threads => 3. Multiline filter cannot be applied since the number of threads is superior to 1. In the other hand, Multiline codec doesn't provide with the stream_identity field. What should I do?
Up to me, aggregate filter answers precisely to your need.
It allows to aggregate multiple lines into one event, using a correlation id. In your case, the correlation id is your field "dynamic_id".
And aggregate manages interlaced lines, contrary to multiline plugin.
That said :
to work fine, you need a "end session" marker in your logs. Without that, you can't know when the session is finished and aggregated event should be sent to elasticsearch.
technically, aggregate filter could be used with mulitple threads. It would work fine if your end line is several seconds after intermediate lines. If not, with multiple threads, you risk that end line could be processed before an intermediate line and then loose this intermediate line.
If there is a about one second, to me, it's OK.
And aggregate plugin is designed to process multiple tasks (or sessions in your case) that come in the same time.
So I think it is the plugin which answers your need.
Finally, if you give me line example for the "end event" and event json structure you expect at end, I can give you logstash configuration to do that (if you are interested).
Yes let's say my pattern is "PPP tunnel.*started.$"
In final event, I'd want the lines having the same dynamic id "seqid" be tied together and combined into one string message with "\n" between each. I don't consider having an array.
All my messages are tagged with "start" as wanted in the configuration but the end seems impossible to be reached out, although the end exists; the "PPP.." log line isn't tagged with "end".
How am I supposed to see my tied up events in the end?
Another question, how to make sure all the log lines don't belong to the "map". Some events are very different and don't necessarily end with the "PPP tunnel" stuff. How to create an aggregate rule for each type of event?
Also, the line log: "PPP tunnel .* started.$" is grokked in the configuration that come before aggregate filter. Why not just representing it with its grok configuration instead of a regex expression?
I think your pattern in "if" is bad.
To be sure it works (and especially when your message finishes by \n, I advice you : if [message] =~ "PPP tunnel .* started\." {
Thus, in "if" conditions, you can only use regex patterns, not grok patterns.
To reply to this question, aggregate will only process log lines with "apm_seqid" field. It is the correlation id used to identify aggregate map.
So only lines with "apm_seqid" field are processed, and one aggregate map is associated to one and only one "apm_seqid" value.
Thank you, it worked eventually with the configuration:
if [message] =~ ".* PPP tunnel .* started\."
The expression "PPP..." is preceded by syslog details.
Anyway, it works fine for some lines, those are tied together, but I've noticed something; the events that occur at the same time (same second) are not gathered, and only the first or last event of the group occuring at the same time is selected to be included in the multiline event.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.