Syslog - logs contain dynamic fields(parameters) based on event type, how to index?

Hi,
I am trying to write a syslog collector for to store logs from a Cisco CUCM into elasticsearch.
Cisco CUCM sends audit logs and alarms as syslog to my logstash server for different system and application events.
The problem is that Cisco CUCM uses a different format of log for different alarm types.
More specifically, the overall format is the same but the parameters are different.
General Format:

SEQUENCE_NUM: TIMESTAMP: %EVENT_TYPE: %PARAMS : DESCRIPTION
where EVENT_TYPE is formatted CATALOGNAME-SEVERITY-ALARM_TYPE
and PARAMS is formatted [param1 =value1][param2 =value2][param3 =value3]...

NOTE : Each alarm type has different parameters and a different number of them.

Example log:

<190>161: Jun 08 2020 07:09:37.449 UTC : %UC_AUDITLOG-6-AdministrativeEvent: %[ UserID =admin][ ClientAddress =xx.xx.xx.xx][ Severity =6][ EventType =UserAccess][ ResourceAccessed=CUCMServiceability][ EventStatus =Success][ CompulsoryEvent =No][ AuditCategory =AdministrativeEvent][ ComponentID =Cisco CCM Servicability][ CorrelationID =][ AuditDetails =Attempt to access data was successful.User is authorized to access auditconfig][App ID=Cisco Tomcat][Cluster ID=][Node ID=cucm-pub]: Audit Event is generated by this application

Different Log:
<190>656: Jun 08 2020 09:34:14.578 UTC : %UC_CALLMANAGER-6-EndPointUnregistered: %[Device name=MY_DEVICE][Device IP address=xx.xx.xx.xx][Protocol=SIP][Device type=36248][Device description=phone 1][Reason Code=9][IPAddressAttributes=0][App ID=Cisco CallManager][Cluster ID=StandAloneCluster][Node ID=cucm-pub]: An endpoint has unregistered

I would like to parse the parameters as index fields in order to be able to seach and filter by them in the future, but I'm having trouble deciding how to index this data in an optimal way.
At first glance, it seems that I have to create a different index pattern and mapping for each alarm type since each alarm parameter has to be mapped to a index field. But this approach is insane because there are A LOT of alarm types and it seems really irrational to have an index pattern for each one.
On the other hand, I could just have one index pattern (e.g "syslog-*) and store the entire log and create a field called "params" which will hold the string "[param1 =value1][param2 =value2][param3 =value3]...". but this way I lose all benefit of aggregations and filters by parameters.

I would love to get some tips regarding how to treat such log format and what's the best way to index this kind into elasticseach.

I hope I explained myself well enough.

Thanks in advance for any help!

I would use dissect to parse off the start of the message, then a kv filter. Something like this.

I have tried doing that.
This is basically the second option that I proposed: using one index pattern for all event types.
The problem with it is that, as I said, each log comes with different keys and so what you get in the end is each document containing hundreds of fields with with only about 10 of them populated with values. Am I wrong about this?
There must be a more efficient way to do that.

Not sure what you mean by this. Using

    dissect { mapping => { "message" => "<%{pri}>%{number}: %{[@metadata][timestamp]} %{+[@metadata][timestamp]} %{+[@metadata][timestamp]} %{+[@metadata][timestamp]} %{+[@metadata][timestamp]} : %{messageTag}: %{[@metadata][restOfLine]}" } }
    date { match => [ "[@metadata][timestamp]", "MMM dd YYYY HH:mm:ss.SSS ZZZ" ] }
    kv { field_split_pattern => "[\[\]]+" trim_key => " " }

I get

{
      "messageTag" => "%UC_AUDITLOG-6-AdministrativeEvent",
   "ClientAddress" => "xx.xx.xx.xx",
 "CompulsoryEvent" => "No",
     "ComponentID" => "Cisco CCM Servicability",
    "AuditDetails" => "Attempt to access data was successful.User is authorized to access auditconfig",
         "Node ID" => "cucm-pub",
"ResourceAccessed" => "CUCMServiceability",
             "pri" => "190",
     "EventStatus" => "Success",
          "App ID" => "Cisco Tomcat",
        "Severity" => "6",
      "@timestamp" => 2020-06-08T07:09:37.449Z,
   "AuditCategory" => "AdministrativeEvent",
          "number" => "161",
          "UserID" => "admin",
       "EventType" => "UserAccess"
}
{
         "messageTag" => "%UC_CALLMANAGER-6-EndPointUnregistered",
         "Cluster ID" => "StandAloneCluster",
        "Reason Code" => "9",
"IPAddressAttributes" => "0",
            "Node ID" => "cucm-pub",
                "pri" => "190",
             "App ID" => "Cisco CallManager",
        "Device type" => "36248",
         "@timestamp" => 2020-06-08T09:34:14.578Z,
  "Device IP address" => "xx.xx.xx.xx",
             "number" => "656",
           "Protocol" => "SIP",
 "Device description" => "phone 1",
        "Device name" => "MY_DEVICE"
}

What do you not like about that?

So after checking it a bit, I noticed I also get similar results as yours where each event has it's own parameters as fields.
However, since I'm storing them in the same index, the index mapping will contain all the different fields from all event types, so if I go to Discover in Kibana, I see on the left a list of ALL the collective fields, but they only apply specifically to documents which have those fields defined.

For example:
When I store both the above logs in the same index, the index mapping will contain the fields:
"Device IP address" and "AuditDetails" even though only the first document has the field AuditDetails and only the second one has Device IP address.
I forgot that this is not like an SQL schema where each document must have all the table fields defined in order to be stored.

I just wonder if this is the correct way to do this, where the mapping contains fields that are not defined in some of the documents in the index.

I hope I did not confuse you further.

I see no reason why that would be a problem.

Well, there are actually cases where having fields sparsely populated can cause a problem, but version 6.0 of the Elastic stack tooks steps to reduce this impact (see the Sparse Field Improvements section of this post). Unless you see performance problems I would not worry about it. It might be worth asking for guidance in the elasticsearch forum if you have an idea of how many fields you may end up with in the mapping and how many a typical document will have.

That's great to hear.
I think this approach will do for now, and I'll keep an eye on perfomance impacts.

Thank you!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.