Preparing Logs analytics with Logstash (for kibana)

Mark_S · September 21, 2022, 7:22am

When creating analytics (for kibana) I need to create keywords
Because unfortunately simple text cannot be analyzed in kibana Dashboards.

So I am trying to break up the logs.
Grok constructor works fine when the logs have a specific format. But what if there are many different formats?
For example using something like if/else for many different patterns?

Is this the recommended way to do it ?
It does not seem to work

grok {
       match => ["logformat1", "\[%{MONTHDAY} %{MONTH} %{TIME}]",
                ["logformat2", "\[%{POSINT:pid}\] %{REDISTIMESTAMP:timestamp}"], 
                ["logformat3", "\[%{NUMBER:timestamp} \[%{INT:database} %{IP:client}:%{NUMBER:port}\] "%{WORD:command}"\s?%{GREEDYDATA:params}"],
                ["else", "\[%{GREEDYDATA:message}"]
      ]
    }

Badger · September 21, 2022, 4:08pm

That is not a useful problem description. What does the data look like? What output do you expect?

To have grok match against multiple patterns you would use

grok {
    match => {
        "message" => [ "\[%{MONTHDAY} %{MONTH} %{TIME}]",
            "\[%{POSINT:pid}\] %{REDISTIMESTAMP:timestamp}", 
            "\[%{NUMBER:timestamp} \[%{INT:database} %{IP:client}:%{NUMBER:port}\] "%{WORD:command}"\s?%{GREEDYDATA:params}",
            "\[%{GREEDYDATA:message}"
        ]
    }
}

Always have the patterns ordered so that a more specific message matches before a less specific. And always anchor your messages if possible, so that a match fails very cheaply. All four of those patterns should probably start with ^ to anchor them to the start of [message].

leandrojmp · September 21, 2022, 4:12pm

You use conditionals to direct the different formats to a specific grok, it is more efficient than to have a grok with multiple formats.

What does your log looks like? As Badger asked, share a sample of your messages.

Depending on how your mesage looks like you do not even need grok and can use other filter to parse, like kv, json, dissect or csv.

Rios · September 21, 2022, 4:20pm

Badger does this missing in grok?
break_on_match => false

Badger · September 21, 2022, 4:40pm

I don't think you need break_on_match in this case. If you are matching against several different log formats it is best to stop as soon as you get a match. If you are matching against several different patterns each of which matches part of a log message then yes, setting that is essential.

Mark_S · September 22, 2022, 11:06am

Thanks for all the answers.
Here are four log formats that will be used as input.
As output I want some keywords if available (e.g "timestamp", "log_level" that are present in all logs, "calling_server" that is available only in one log etc). Note there are going to be even more diverse logs (eg java exceptions)

The last log will have most keywords (so this will be put first). Then the other ones will partially match (timestamp, loglevel, ID_number). So what I need is a complete match of the last one and from there on I would like to break up the messages as much as possible. If nothing matches (eg java exceptions) I am going to keep it as a whole message (text)

2022-05-27 16:57:40.057  INFO [exmp-docgen-srv,613863e75eb43d9c,613863e75eb43d9c] 2680242 --- [http-nio-8386-exec-3] o.keycloak.adapters.KeycloakDeployment   : [applicationRequestID=] Loaded URLs from http://exmp-auth.exmp.local:5000/auth/realms/exmp-dev/.well-known/openid-configuration
2022-05-27 16:57:49.121  WARN [exmp-docgen-srv,613863e75eb43d9c,613863e75eb43d9c] 2680242 --- [http-nio-8386-exec-3] o.a.c.util.SessionIdGeneratorBase        : [applicationRequestID=74967d30-58d1-4aa9-860c-04963ac24917] Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [359] milliseconds.
2022-05-27 16:57:51.710  INFO [exmp-docgen-srv,613863e75eb43d9c,613863e75eb43d9c] 2680242 --- [http-nio-8386-exec-3] g.u.m.d.c.DocumentGenerationController   : [applicationRequestID=74967d30-58d1-4aa9-860c-04963ac24917] Generate MSWord document called with template:exmp-document-generation-sample-template.docx

2022-09-20 16:08:20.874 ERROR 3608769 --- [scheduling-1] g.u.m.i.s.q.QueueItemHandlingService     : An exception occured: No results for path: $['unit3Descr']
2022-09-20 16:08:35.889  INFO 3608769 --- [scheduling-1] g.u.m.i.clients.RestClientFilters        : Request: Method: GET, URL: http://exmp-index.exmp.local:8381/camel/exmp/unit2/34554
2022-09-20 16:08:35.988  INFO 3608769 --- [reactor-http-epoll-3] g.u.m.i.clients.RestClientFilters        : Response: 200 OK
2022-09-20 16:08:36.005 ERROR 3608769 --- [scheduling-1] g.u.m.i.s.q.QueueItemHandlingService     : An exception occured: No results for path: $['unit3Descr']

2022-09-22 13:46:19.479  INFO 688857 --- [http-nio-8384-exec-5] o.s.c.c.s.e.NativeEnvironmentRepository  : Adding property source: Config resource 'file [/opt/applications/exmp-configuration-service-configs/exmp-jbpm-configuration-mapping-dev.properties]' via location 'file:///opt/applications/exmp-configuration-service-configs/'
2022-09-22 13:46:19.485  INFO 688857 --- [http-nio-8384-exec-6] o.s.c.c.s.e.NativeEnvironmentRepository  : Adding property source: Config resource 'file [/opt/applications/exmp-configuration-service-configs/exmp-jbpm-configuration-mapping-dev.properties]' via location 'file:///opt/applications/exmp-configuration-service-configs/'

2022-09-22 13:48:28.336  INFO Request-Info:[protocol=HTTP/1.1, exmp_user_name=exmp-supervisor, method=GET, entrypoint=/api/prj-menus/unit1/1, app_server=exmp-index:8081, x_request_identifier=b2064184-7a14-4439-93d4-7b7e7126dd1a, x_active_unit1_id=1, x_menu_id=33, cause=incoming_request, calling_server=172.30.2.224, request_time=N/A, reception_date=2022-09-22T13:48:28.247+0300, time_elapsed_ms=88] 2756977 --- [XNIO-1 task-3] g.u.m.commons.logging.MDCLoggingFilter   : processing_end

Rios · September 23, 2022, 7:47am

I can see the first line has clean fields, other lines are with the optional fields. Grok would be:

%{TIMESTAMP_ISO8601:date}\s+%{LOGLEVEL:level}\s+(%{DATA:fieldx})?\s*%{POSINT:threadid}\s+%{DATA:dashes}\s+(\[%{DATA:process}\]\s+)?(%{DATA:method}\s+\:\s+)?%{GREEDYDATA:message}

Disclamer: I have tested in Grok debugger, all three lines have the same fields. Only you need after this remove [ and ] with gsub from fieldx, if exists. Also you can split further fiedx ([exmp-docgen-srv,613863e75eb43d9c,613863e75eb43d9c] ),if is important.

Mark_S · September 24, 2022, 8:25am

nice solution Rios! thank you

I will also try to use kv to further analyze fieldx

Rios · September 24, 2022, 8:35am

Yes, would advise to use KV for cases fieldname=value, but test.
For this: exmp-docgen-srv,613863e75eb43d9c,613863e75eb43d9c you can also use CSV filter.

Since this sample is only a few lines, something tell me that will be anomalies. Below 100 lines per sample you cannot be sure that everything will be OK.

Your first approach(multiple matches) is not wrong, test what is faster.

system · October 22, 2022, 8:36am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Beginner question grok + pattern matching + different line format Logstash	2	1251	June 14, 2017
Single grok expressions for different format of logs? Logstash	2	608	July 6, 2019
Grok pattern to match different log formats in same log file Logstash	2	2063	August 13, 2018
Break down log messages to keywords Logstash	1	185	October 14, 2022
How to parse single log file with multiple grok pattern Logstash	4	3099	June 1, 2017

Preparing Logs analytics with Logstash (for kibana)

Related topics