How to manage multiline events based on a random field

jpry · March 4, 2016, 2:18pm

Hello all,

I've been facing a problem related to multiline events lately, and I am needing a little bit of your help for this. My syslog server is sending multi-line events. One single event gathers several lines, and the indicator that proves a particular event line is part of a multi-line event is a random number that defines a user connection session. Here is a custom generated log file:

Feb 16 17:29:04 slot1/APM-LTM notice apd[5515]: 01490010:5: 1ec2b273: Username 'cjones'
Feb 16 17:29:04 slot1/APM-LTM warning apd[5515]: 01490106:4: 1ec2b273: AD module: authentication with 'cjones' failed: Preauthentication failed, principal name: cjones@GEEKO.COM. Invalid user credentials.  (-1765328360)
Feb 16 17:10:04 slot1/APM-LTM notice apd[5515]: 01490010:5: d8b5a591: Username 'gbridget'
Feb 16 17:10:04 slot1/APM-LTM err apd[5515]: 01490107:3: d8b5a591: AD module: authentication with 'gbridget' failed: Clients credentials have been revoked, principal name: gbridget@GEEKO.COM. User account is locked  (-1765328366)
Feb 16 17:29:04 slot1/APM-LTM notice apd[5515]: 01490005:5: 1ec2b273: Following rule 'fallback' from item 'AD Auth' to ending 'Deny'
Feb 16 17:29:04 slot1/APM-LTM notice apd[5515]: 01490102:5: 1ec2b273: Access policy result: Logon_Deny

Above are the lines related to two different connections defined by the following user sessions: d8b5a591(user gbridget) and 1ec2b273(user cjones). user sessions are the only indicators that connect those lines to two different events. not to mention that the line events are interlaced.

The problem is that I am at loss as how to explain the above to grok filter with a multiline plugin, knowing that the latter offers too few options. In fact , the notion of "previous" and "next" line cannot be applied here for instance, so the grok options "pattern" and "what" cannot be used, since the events are not necessarily consecutive.

I would really appreciate it if someone could shed some light on this and tell me if at least it is feasable or not.

magnusbaeck · March 6, 2016, 2:22pm

Do you really need to join these lines into a single event? To me they're separate but related events and I wouldn't attempt to merge them. If you insist, have a look at the collate and aggregate filters.

jpry · March 6, 2016, 7:53pm

Are there other alternatives? I actually have no idea what else I could do to perform what I've got in mind. I'd like to gather the lines related to a particular connection into one single event, hence, one single document, so that I could query the latter to fetch all I need about the related connection, and visualize it via KIBANA.

It would be something like this: logstash.jira.com/browse/LOGSTASH-1785, but rather than dealing with consecutive multiline events, one'd deal with interlaced events.

jpry · March 30, 2016, 1:16pm

I have set the number of threads to multiple and can't measure the amount of time an event can last so neither of the filters you mentioned can work in my environment And I've been told that ES documents cannot be joined in order to be queried, I am really stuck. Any clue for this?

jpry · March 31, 2016, 2:30pm

Requesting help again

svn · March 31, 2016, 3:29pm

I've got exactly the same question. Can't believe this isn't a common issue.
For me I got exception messages within my log file including multiline
stack traces. So it's not suitable to have one event for each line of my stack
trace.
The solution would be to set a field, e.g. @timestamp, as a delimiter since
this is the real separator for the log entries but this seems not to work at
all?

jpry · March 31, 2016, 3:37pm

My delimiter is the dynamic id. @timestamp can't be a delimiter since each line of log has its own timestamp. The last log line of a particular event may come 2 hours after the first one, and meanwhile other events may occur.

svn · April 1, 2016, 7:21am

The dynamic id? Could you be so kind to post your .conf file, please?

jpry · April 28, 2016, 3:55pm

Hi,
If I had to configure a multiline, it would be something like this:

multiline
{
stream_identity => "%{dynamic_id}"
pattern  => "."
what => "previous"
add_tag => ["multiline"]
}

But I have a number of threads => 3. Multiline filter cannot be applied since the number of threads is superior to 1. In the other hand, Multiline codec doesn't provide with the stream_identity field. What should I do?

fbaligand · May 1, 2016, 3:52pm

Up to me, aggregate filter answers precisely to your need.
It allows to aggregate multiple lines into one event, using a correlation id. In your case, the correlation id is your field "dynamic_id".
And aggregate manages interlaced lines, contrary to multiline plugin.

That said :

to work fine, you need a "end session" marker in your logs. Without that, you can't know when the session is finished and aggregated event should be sent to elasticsearch.
technically, aggregate filter could be used with mulitple threads. It would work fine if your end line is several seconds after intermediate lines. If not, with multiple threads, you risk that end line could be processed before an intermediate line and then loose this intermediate line.

jpry · May 2, 2016, 11:26am

First of all, thanks for answering. It helps.

It is ok if there is only one second between first line and end line? and meanwhile other lines belonging to other events would come between them?

I'll try that out. If it works, you would've made my day.

fbaligand · May 2, 2016, 12:12pm

If there is a about one second, to me, it's OK.
And aggregate plugin is designed to process multiple tasks (or sessions in your case) that come in the same time.
So I think it is the plugin which answers your need.

Finally, if you give me line example for the "end event" and event json structure you expect at end, I can give you logstash configuration to do that (if you are interested).

jpry · May 2, 2016, 1:24pm

Sure any help is welcome. An end line would look like this:

<141>May  2 15:04:01 slot1/LTM notice tmm1[11112]: 01490505:5: 00000000: PPP tunnel 0x000000000000 started.

And its JSON equivalent:

{
"_index": "logstash-XXXX.XX.XX",
"_type": "logstash",
"_id": "XXXXXXXXXXXXXXXXXXXXXXX",
  "_score": null,
  "_source": {
    "message": "<141>May  2 15:04:01 slot1/LTM notice tmm1[11112]: 01490505:5: 00000000: PPP tunnel 0x000000000000 started.\n",
    "@version": "1",
    "@timestamp": "2016-05-02T13:04:01.055Z",
    "type": "logstash",  
    "host": "192.168.10.10",
    "tags": [
      "_grokparsefailure_sysloginput",
      "grokked"
    ],
    "priority": 0,
    "severity": 0,
    "facility": 0, 
    "facility_label": "kernel",
    "severity_label": [
      "Emergency",
      "notice"
    ],
    "received_at": "2016-05-02T13:04:01.055Z",
    "hostname": "XXXXXXXXXXXXXXXXXXXXXX",
    "syslog_severity_code": 5,
    "syslog_facility_code": 1,
    "syslog_facility": "user-level",
    "syslog_severity": "notice",
    "syslog_pri": "141",
    "logsource": "LTM",
    "program": "tmm1",
    "pid": "11112",
    "sequence": "01490505:5",
    "seqid": "00000000",
    "info": "PPP tunnel 0x000000000000 started.\n",
    "tunnel_id": "0x000000000000"
  },
  "fields": {
    "@timestamp": [
      1462194241055
    ],
    "received_at": [
      1462194241055
    ]
     },
  "highlight": {
    "message": [
      "<141>May  2 15:04:01 slot1/LTM notice tmm1[11112]: 01490505:5: @kibana-highlighted-field@00000000@/kibana-highlighted-field@: PPP tunnel 0x000000000000 started.\n"
    ],
    "seqid.raw": [
      "@kibana-highlighted-field@00000000@/kibana-highlighted-field@"
    ],
    "seqid": [
      "@kibana-highlighted-field@00000000@/kibana-highlighted-field@"
    ]
  },
     "sort": [
            1462194241055
          ]
        }

fbaligand · May 2, 2016, 2:59pm

OK, so :

the pattern for the end event is "tunnel .+ started.$" ?
how do you want lines are aggregated in final event ? message field as an array ? message field as a string with "\n" between each line ?
which is the correlation id field ? apm_seqid ?

jpry · May 2, 2016, 3:10pm

Yes let's say my pattern is "PPP tunnel.*started.$"
In final event, I'd want the lines having the same dynamic id "seqid" be tied together and combined into one string message with "\n" between each. I don't consider having an array.

Thanks

fbaligand · May 2, 2016, 3:27pm

aggregate {
    task_id => "%{apm_seqid}"
    code => "map['message'] ||= '' ; map['message'] += event['message'] + '\n'"
}

if [message] =~ "PPP tunnel .* started.$" {
    aggregate {
        task_id => "%{seqid}"
        code => "event['message'] = map['message']"
        end_of_task => true
    }
}

jpry · May 2, 2016, 4:12pm

Ok, I'll try out the configuration above, and give you my feedback as soon as possible. Thanks anyway.

jpry · May 12, 2016, 9:42am

Hi,
Either I didn't understand how to use it or my configuration is wrong, I've added the following code:

    aggregate {
            task_id => "%{apm_seqid}"
            code => "map['message'] ||= '' ; map['message'] += event['message'] + '\n'"
            add_tag => ['start']
            }

    if [message] =~ "^.*PPP tunnel*+started.$" {
    aggregate {
            task_id => "%{apm_seqid}"
            code => "event['message'] = map['message']"
            end_of_task => true
            add_tag => ['end']
            }
            }

All my messages are tagged with "start" as wanted in the configuration but the end seems impossible to be reached out, although the end exists; the "PPP.." log line isn't tagged with "end".

How am I supposed to see my tied up events in the end?

Another question, how to make sure all the log lines don't belong to the "map". Some events are very different and don't necessarily end with the "PPP tunnel" stuff. How to create an aggregate rule for each type of event?

Also, the line log: "PPP tunnel .* started.$" is grokked in the configuration that come before aggregate filter. Why not just representing it with its grok configuration instead of a regex expression?

Thanks in advance

fbaligand · May 12, 2016, 9:55am

I think your pattern in "if" is bad.
To be sure it works (and especially when your message finishes by \n, I advice you :
if [message] =~ "PPP tunnel .* started\." {

Thus, in "if" conditions, you can only use regex patterns, not grok patterns.

To reply to this question, aggregate will only process log lines with "apm_seqid" field. It is the correlation id used to identify aggregate map.
So only lines with "apm_seqid" field are processed, and one aggregate map is associated to one and only one "apm_seqid" value.

jpry · May 20, 2016, 8:28am

Thank you, it worked eventually with the configuration:

if [message] =~ ".* PPP tunnel .* started\."

The expression "PPP..." is preceded by syslog details.

Anyway, it works fine for some lines, those are tied together, but I've noticed something; the events that occur at the same time (same second) are not gathered, and only the first or last event of the group occuring at the same time is selected to be included in the multiline event.

Is there a way to overcome this?

Topic		Replies	Views
Multiline codec discrepancies between file and stdin inputs Logstash	7	1324	July 6, 2017
Multiline logs into one event using logstash? Logstash	10	262	December 30, 2022
How to write a Logstash Multi-line Pattern for the Windows Events Logstash	1	281	June 26, 2020
Equivalent of 'stream_identity' for multiline codec Logstash	3	1059	July 6, 2017
Multiline issue with log Logstash	6	320	August 15, 2018

How to manage multiline events based on a random field

Related topics