Preparing Logs analytics with Logstash (for kibana)

When creating analytics (for kibana) I need to create keywords
Because unfortunately simple text cannot be analyzed in kibana Dashboards.

So I am trying to break up the logs.
Grok constructor works fine when the logs have a specific format. But what if there are many different formats?
For example using something like if/else for many different patterns?

Is this the recommended way to do it ?
It does not seem to work

grok {
       match => ["logformat1", "\[%{MONTHDAY} %{MONTH} %{TIME}]",
                ["logformat2", "\[%{POSINT:pid}\] %{REDISTIMESTAMP:timestamp}"], 
                ["logformat3", "\[%{NUMBER:timestamp} \[%{INT:database} %{IP:client}:%{NUMBER:port}\] "%{WORD:command}"\s?%{GREEDYDATA:params}"],
                ["else", "\[%{GREEDYDATA:message}"]
      ]
    }

That is not a useful problem description. What does the data look like? What output do you expect?

To have grok match against multiple patterns you would use

grok {
    match => {
        "message" => [ "\[%{MONTHDAY} %{MONTH} %{TIME}]",
            "\[%{POSINT:pid}\] %{REDISTIMESTAMP:timestamp}", 
            "\[%{NUMBER:timestamp} \[%{INT:database} %{IP:client}:%{NUMBER:port}\] "%{WORD:command}"\s?%{GREEDYDATA:params}",
            "\[%{GREEDYDATA:message}"
        ]
    }
}

Always have the patterns ordered so that a more specific message matches before a less specific. And always anchor your messages if possible, so that a match fails very cheaply. All four of those patterns should probably start with ^ to anchor them to the start of [message].

You use conditionals to direct the different formats to a specific grok, it is more efficient than to have a grok with multiple formats.

What does your log looks like? As Badger asked, share a sample of your messages.

Depending on how your mesage looks like you do not even need grok and can use other filter to parse, like kv, json, dissect or csv.

Badger does this missing in grok?
break_on_match => false

I don't think you need break_on_match in this case. If you are matching against several different log formats it is best to stop as soon as you get a match. If you are matching against several different patterns each of which matches part of a log message then yes, setting that is essential.

Thanks for all the answers.
Here are four log formats that will be used as input.
As output I want some keywords if available (e.g "timestamp", "log_level" that are present in all logs, "calling_server" that is available only in one log etc). Note there are going to be even more diverse logs (eg java exceptions)

The last log will have most keywords (so this will be put first). Then the other ones will partially match (timestamp, loglevel, ID_number). So what I need is a complete match of the last one and from there on I would like to break up the messages as much as possible. If nothing matches (eg java exceptions) I am going to keep it as a whole message (text)

2022-05-27 16:57:40.057  INFO [exmp-docgen-srv,613863e75eb43d9c,613863e75eb43d9c] 2680242 --- [http-nio-8386-exec-3] o.keycloak.adapters.KeycloakDeployment   : [applicationRequestID=] Loaded URLs from http://exmp-auth.exmp.local:5000/auth/realms/exmp-dev/.well-known/openid-configuration
2022-05-27 16:57:49.121  WARN [exmp-docgen-srv,613863e75eb43d9c,613863e75eb43d9c] 2680242 --- [http-nio-8386-exec-3] o.a.c.util.SessionIdGeneratorBase        : [applicationRequestID=74967d30-58d1-4aa9-860c-04963ac24917] Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [359] milliseconds.
2022-05-27 16:57:51.710  INFO [exmp-docgen-srv,613863e75eb43d9c,613863e75eb43d9c] 2680242 --- [http-nio-8386-exec-3] g.u.m.d.c.DocumentGenerationController   : [applicationRequestID=74967d30-58d1-4aa9-860c-04963ac24917] Generate MSWord document called with template:exmp-document-generation-sample-template.docx

2022-09-20 16:08:20.874 ERROR 3608769 --- [scheduling-1] g.u.m.i.s.q.QueueItemHandlingService     : An exception occured: No results for path: $['unit3Descr']
2022-09-20 16:08:35.889  INFO 3608769 --- [scheduling-1] g.u.m.i.clients.RestClientFilters        : Request: Method: GET, URL: http://exmp-index.exmp.local:8381/camel/exmp/unit2/34554
2022-09-20 16:08:35.988  INFO 3608769 --- [reactor-http-epoll-3] g.u.m.i.clients.RestClientFilters        : Response: 200 OK
2022-09-20 16:08:36.005 ERROR 3608769 --- [scheduling-1] g.u.m.i.s.q.QueueItemHandlingService     : An exception occured: No results for path: $['unit3Descr']

2022-09-22 13:46:19.479  INFO 688857 --- [http-nio-8384-exec-5] o.s.c.c.s.e.NativeEnvironmentRepository  : Adding property source: Config resource 'file [/opt/applications/exmp-configuration-service-configs/exmp-jbpm-configuration-mapping-dev.properties]' via location 'file:///opt/applications/exmp-configuration-service-configs/'
2022-09-22 13:46:19.485  INFO 688857 --- [http-nio-8384-exec-6] o.s.c.c.s.e.NativeEnvironmentRepository  : Adding property source: Config resource 'file [/opt/applications/exmp-configuration-service-configs/exmp-jbpm-configuration-mapping-dev.properties]' via location 'file:///opt/applications/exmp-configuration-service-configs/'

2022-09-22 13:48:28.336  INFO Request-Info:[protocol=HTTP/1.1, exmp_user_name=exmp-supervisor, method=GET, entrypoint=/api/prj-menus/unit1/1, app_server=exmp-index:8081, x_request_identifier=b2064184-7a14-4439-93d4-7b7e7126dd1a, x_active_unit1_id=1, x_menu_id=33, cause=incoming_request, calling_server=172.30.2.224, request_time=N/A, reception_date=2022-09-22T13:48:28.247+0300, time_elapsed_ms=88] 2756977 --- [XNIO-1 task-3] g.u.m.commons.logging.MDCLoggingFilter   : processing_end

I can see the first line has clean fields, other lines are with the optional fields. Grok would be:

%{TIMESTAMP_ISO8601:date}\s+%{LOGLEVEL:level}\s+(%{DATA:fieldx})?\s*%{POSINT:threadid}\s+%{DATA:dashes}\s+(\[%{DATA:process}\]\s+)?(%{DATA:method}\s+\:\s+)?%{GREEDYDATA:message}

Disclamer: I have tested in Grok debugger, all three lines have the same fields. Only you need after this remove [ and ] with gsub from fieldx, if exists. Also you can split further fiedx ([exmp-docgen-srv,613863e75eb43d9c,613863e75eb43d9c] ),if is important.

nice solution Rios! thank you

I will also try to use kv to further analyze fieldx

Yes, would advise to use KV for cases fieldname=value, but test.
For this: exmp-docgen-srv,613863e75eb43d9c,613863e75eb43d9c you can also use CSV filter.

Since this sample is only a few lines, something tell me that will be anomalies. Below 100 lines per sample you cannot be sure that everything will be OK.

Your first approach(multiple matches) is not wrong, test what is faster.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.