Scripted fields regular expression not working

I am having issues getting my regular expression to work with my scripted fields. I have the below code (dont judge I am not a developer) and for some reason the same exact line ends up in different "else" clauses. the main goal here was for me to be able to split a logger message from a pod so that there was a new field called event_message that contained only the "Message: [ .* ] " content inside of it.

Version: Kibana 7.5.0

the code sometimes works but sometimes doesnt. any help is greatly appreciated. to me it looks like the initial IF statement checking the SIZE() of the message.keyword is an issue. Sometimes I will see that when I return the size() of the message.keyword, it shows 0 but yet I can see the message field. so is there something with message and message.keyword that would cause the size of keyword to be < 0 where the message field is still valid as a string?

Example. below says that message.keyword size < 0 when I see thats not true (if (doc['message.keyword'].size()<=0)):
EVENT_MESSAGE: no message.keyword
MESSAGE FIELD: Tue, Sep 01 2020 13:35:20 GMT | INFO | Thread: [ CP Server Thread-6 ] | Logger: [ requestlogger ] | Function: [ __call__ ] | Module: [ __init__ ] | File: [ __init__.py:57] | Message: [ 100.100.100.100 - - [01/Sep/2020:13:35:20 +0000] "GET /healthz HTTP/1.1" 200 2 "" "kube-probe/1.16" 0/125 ]

{
  "_id": "W8oHRXQBe13aht8PKB8R",
  "message": "Mon, Aug 31 2020 14:58:45 GMT | INFO | Thread: [ CP Server Thread-6 ] | Logger: [ requestlogger ] | Function: [ __call__ ] | Module: [ __init__ ] | File: [ __init__.py:57]  | Message: [ 100.100.100.100 - - [31/Aug/2020:14:58:45 +0000] \"GET /healthz HTTP/1.1\" 200 2 \"\" \"kube-probe/1.16\" 0/123 ]\n",
  "event_message": [
   "no message.keyword"
  ]
 }

Scripted Field Code:
I am using Paingless lenguage, I dont see option for expression


    if (doc['message.keyword'].size()<=0) {
        return "no message.keyword"
    } else {
        def obj = (doc['message.keyword'].value).toString();
        if (obj.toLowerCase().contains('metrics') || obj.toLowerCase().contains('prometheus') || obj.toLowerCase().contains('healthz') || obj.toLowerCase().contains('requestlogger') || obj.toLowerCase().contains('curl/7.56.1')) { 
            return "metrics"
        } else if (obj.toLowerCase().contains('message:')) {
            if (obj.toLowerCase().contains('|')) {
                def messageObj = obj.replace(' | ', '+');
                def s = (/\+/.split(messageObj));
                def list = new ArrayList();
                boolean eventMessageFound = false;
                String eventMessage = "";
                for (item in s) {
                    if (item.toString().toLowerCase().contains('message:') || item.toString().toLowerCase().startsWith('message:')) {
                        eventMessage = item;
                        eventMessageFound = true;
                    }
                }
                if ( eventMessageFound ) {
                    return eventMessage
                } else {
                    return "no message" 
                }
            } else {
                return "didnt find message character |"
            }
        } else {
            return "no message"
        }
    }

a couple of logger lines that are in this logstash index:

event_message == "no message.keyword"
Tue, Sep 01 2020 13:11:53 GMT | INFO | Thread: [ CP Server Thread-3 ] | Logger: [ requestlogger ] | Function: [ __call__ ] | Module: [ __init__ ] | File: [ __init__.py:57] | Message: [ 100.100.100.100 - - [01/Sep/2020:13:11:53 +0000] "GET /metrics HTTP/1.1" 200 1697 "" "Prometheus/2.13.1" 0/4268 ]

event_message == "no message"
POST /elasticsearch/_msearch 200 698ms - 9.0B
W0901 13:27:11.624563 4173 setters.go:158] adding overridden hostname of fake-server-name-m2.xlarge-d-tz8h2 to cloudprovider-reported addresses
100.100.100.100 - - [01/Sep/2020:13:27:13 +0000] "GET /_nodes/stats HTTP/1.1" 200 5242 "-" "Go-http-client/1.1" 127 0.109 [elasticsearch-master-9200] [] 100.100.100.100:9200 5242 0.108 200 d892e61d7derbc0eec7701aa3ebcf24b4

event_message == "metrics"
100.100.100.100 - - [01/Sep/2020:13:30:19 +0000] "GET /metrics HTTP/1.1" 200 1702 "" "Prometheus/2.13.1" 0/8757
Tue, Sep 01 2020 13:29:10 GMT | INFO | Thread: [ CP Server Thread-3 ] | Logger: [ root ] | Function: [ send ] | Module: [ iowrapper_functions ] | File: [ iowrapper_functions.py:69] | Message: [ Current serverless runtime does not support metrics. ]

event_message == THESE ARE CORRECT
MESSGE FIELD: Tue, Sep 01 2020 13:29:12 GMT | INFO | Thread: [ CP Server Thread-4 ] | Logger: [ root ] | Function: [ insert_es ] | Module: [ es_functions ] | File: [ es_functions.py:107] | Message: [ Trying to insert 270 items into checks. ]
EVENT_MESSAGE FIELD: Message: [ Trying to insert 270 items into checks. ]

Hi

this might be an issue with your mapping, because by default, strings longer than 256 characters are ignored for the keyword field type. this might explain why it sometime works:

https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html#keyword-params

Anyway it might make more sense to index the field as text. if it is a multi field, you could try to use doc[message] instead of doc[message.keyword]. If that doesn't work you need to change your mapping

Best,
Matthias

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.