Logstash aggregate syslog entries

Pablo_Marques · February 9, 2021, 5:09pm

Hello

I am getting syslogs from Clearpass servers and using logstash to ingest them into elasticsearch.
Some of these messages are received on multiple syslog packets, but they are really the same (big) message.
I am trying to combine them (without much success) into one message before sending them to elastic.
My grok entry is as follows:

grok {
        match => [
                 "message", "<%{POSINT:syslog_pri}>%{TIMESTAMP_ISO8601:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program} %{POSINT:syslog_pid} %{NONNEGINT:num_of_msgs} %{NONNEGINT:msg_num} %{GREEDYDATA:message}",
                 "message", "<%{POSINT:syslog_id}>\.\.\.%{GREEDYDATA:message}"
        ]
        overwrite => [ "message" ]
     }

When a message has multiple entries "num_of_msgs" reflects the number of them (1,2,3,4 etc) and "msg_num" is the sequence number for each message (0,1,2, etc).
The common element between all entries is "syslog_id".

The idea is to concatenate all "message" entries with the same "syslog_id" in the proper order, create an entry with that information and submit it to elasticsearch. Also do not want the partial messages in elasticsearch, only the combined one.

I tried this, but it is not doing what I want at all.

if [num_of_msgs] > "1" {
           aggregate {
              task_id => "%{syslog_pid}"
              code => "map['message'] ||=' '; map['message'] +=%{\n}+ event.get('message')"
              map_action => "create_or_update"
              push_map_as_event_on_timeout => true
              timeout => 10
              timeout_tags => ['aggregated']
           }
           if "aggregated" not in [tags] {
              drop{}
           }
     }

I did a lot of google search but I do not really found anything I can follow.
Can you please help me with pointers or suggestions.

Thank you

wwalker · February 9, 2021, 5:40pm

Do these messages have a unique ID? So, for example, if an event that comes in over 4 packets, does each contain a field with the same value that is globally unique? You could potentially use that as the index ID and the event will be updated in Elasticsearch.

Since you don't list it, I would assume you are using the TCP input to receive the events. As long as the separate packets are contiguous, you might be able to use the multiline codec to stitch the events together.

Pablo_Marques · February 9, 2021, 5:54pm

Thank you for your answer

The common element between all entries is "syslog_pid".
I am using UDP

So if I receive these two syslog entries like this:
.... CPPM_blah_blah 1656456 2 0 "message 123 123 123"
.... CPPM_blah_blah 1656456 2 1 "message 456 456 456"

I want to submit to elasticsearch the following:
.... CPPM_blah_blah 1656456 1 0 "message 123 123 123 message 456 456 456"

wwalker · February 9, 2021, 7:12pm

Ahh....that makes it more complex. I would guess something like

Setup elasticsearch output to use the syslog_pid as the document id
Perform an elasticsearch lookup in your pipeline filter to pull the previously ingested event data.
Use another filter, maybe mutate, to create the array of values.

Pablo_Marques · February 9, 2021, 7:26pm

There is a way using the logstash aggregate filter plugin, but I cannot figure out the right syntax for my case.

wwalker · February 9, 2021, 7:59pm

Interesting...

So...and my interpretation may be wrong, but going off your existing 'code' configuration, map['message'] ||=' '; map['message'] +=%{\n}+ event.get('message'), I would read that to say "(1)Map previous event field message as blank if it doesn't exist. (2)Map previous event field message and append with a new line and the value of the current event's message field."

What happens if you simplify it to "event.set('message', map['message'])"? I would expect it to concatenate the values into a single string.

Badger · February 9, 2021, 8:50pm

When using push_map_as_event_on_timeout the only fields that will be present on the event are whatever you added to the map.

You could try something like this:

     aggregate {
         task_id => "%{syslog_pid}"
         code => '
             map["syslog_pri"] ||= event.get("syslog_pri")
             map["syslog_timestamp"] ||= event.get("syslog_timestamp")
             map["syslog_hostname"] ||= event.get("syslog_hostname")
             map["syslog_program"] ||= event.get("syslog_program")
             map["syslog_pid"] ||= event.get("syslog_pid")
             map["message"] ||= []
             map["message"] << event.get("message")
             event.cancel
         '
         push_map_as_event_on_timeout => true
         timeout => 10
     }

If that syslog_pid really is a process id and not a message id then if you get two messages within 10 seconds this will combine them. To avoid that you would have to stop using push_map_as_event_on_timeout and instead use a second aggregate filter with map_action => "update" and end_of_task => true when [msg_num] is one less than [num_of_msgs] (which would require a ruby filter to do the maths). More like example 2 than example 3.

Pablo_Marques · February 9, 2021, 9:02pm

I believe

syslog_pri

is a message id

I should rename it to: syslog_id or message_id

Pablo_Marques · February 12, 2021, 4:59am

aggregate is now working as expected.

But instead of concatenate the messages, it adds values to an array:

map["message"] << get.event("message")

message {
[0] = " this is line 1"
[1] = "this is line 2"
etc
}

Is there a way to concatenate to the string instead of adding a new array entry?

Badger · February 12, 2021, 7:55pm

Sure, you could

         map["message"] ||= ""
         map["message"] += event.get("message")

possibly adding a delimiter as well.

Pablo_Marques · February 12, 2021, 8:02pm

I get this error when I use +=
occurred {:error=>#<TypeError: no implicit conversion of String into Array>,

Badger · February 12, 2021, 8:06pm

Did you change map["message"] ||= [] to map["message"] ||= ""?

Pablo_Marques · February 12, 2021, 8:08pm

I just did
Seems to be working now.
so by putting the square brackets

the variable is defined as an array!!

Thank you

system · March 12, 2021, 8:09pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Aggregate syslog with Logstash Logstash	1	435	January 14, 2019
Syslog with logstash mapping Elasticsearch	5	1784	September 4, 2017
Grok filter multiple match Logstash	4	18316	October 31, 2019
Syslog Multiple log entries in a single log Logstash	1	529	July 6, 2017
[Logstash 7.2] Syslog Output message Logstash	2	1018	August 30, 2019

Logstash aggregate syslog entries

Related topics