Regex not working with painless

forst · September 14, 2021, 8:38pm

I'm trying to write a regex query for a java log error as following

[Poolthread] com.xxxx.content.core-bundle com.xxxxx.content.model.impl.RegisterTypeInternal(3179)] The activate method has thrown an exception (com.xxxxx.content.model.exception.ModelException: ModelException: {Code}-LCC-REP-FCT-002, {Message}-Access denied)
com.xxxx.content.model.exception.ModelException: ModelException: {Code}-LCC-REP-FCT-002, {Message}-Access denied
	at com.xxxxx.content.repository.utils.ExceptionUtil.getException(ExceptionUtil.java:52)
	at com.xxxxx.content.repository.utils.ExceptionUtil.getException(ExceptionUtil.java:171)

i'm use this regex which gets an exception word before the character : (like as ModelException ) but the result is null . i tested this regex on the site regex101.com and it works fine.

([a-zA-Z0-9_]+)(?=:)

with a simple script field / runtime field who return the first group

def m = /([a-zA-Z0-9_]+)(?=:)/.matcher(doc['field.keyword'].value);
if ( m.find() ) {
   return m.group(0)
} else {
   return "no find"
}

I have no idea if am I missing something in the syntax .
Otherwise I am also looking for regex which allows to retrieve information on the java log error such as classname, url, file, exception, level,..

Thank you for your help

stu · September 14, 2021, 9:42pm

Hi @frost,
Can you post your mappings? You may have something like:

    "mappings" : {
      "properties" : {
        "field" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }

If so, the above entry would not be available as it's 561 characters. It's usually not a good idea to allow keyword fields above 256 characters to avoid bloating the index size.

However, you can use params['_source']['field'] to access the value from source. This works with both runtime fields and script fields.

def m = /([a-zA-Z0-9_]+)(?=:)/.matcher(params['_source']['field']);
if ( m.find() ) {
   return m.group(0) // emit(m.group(0)) for runtime fields
} else {
   return "no find"
}

warkolm · September 14, 2021, 10:30pm

Welcome to our community!

As an alternative approach, see if you can setup an ingest pipeline to handle the extraction of the values during the indexing process. It'll make querying a tonne easier.

forst · September 22, 2021, 1:22pm

thanks for your response,
I apologize for my late response. I did not have the administrator level to manage the mapping.
I use with params['_source']['field'] but it'seems doesn't work.
but it seems to me that there is a limit of 256 characters. Indeed, I used other regex to extract the first word of a message and the result shows well.

So there will be a change in the data mapping level, I will come back to give you my answer if it works or not.

forst · September 23, 2021, 2:05pm

@stu
here the mapping

   ....
     "message" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 65536
            }
          }
        }
   .....

what's do you think ?

stu · September 23, 2021, 3:33pm

Hi @forst,
At this point I'm assuming there's something wrong elsewhere in your setup and there isn't enough info in this thread to diagnose where the problem lies.

Your script is fine and works when I tested it with the given data and correct mappings, it also worked when I tested it with params['_source']['field'].

I use with params['_source']['field'] but it'seems doesn't work.

What happens when you return that value rather than trying to run a regex against it? Please post the precise script, document and result.

what's do you think ?

Was that the original or updated mapping? If it's updated, then did you reindex the documents? The updated mapping only applies to indexed documents after the mapping takes place.

forst · September 23, 2021, 3:43pm

Hi @stu

here the result when i run the script

the mapping is original

stu · September 23, 2021, 4:06pm

What happens when you return params['_source']['field']? What happens when you Debug.explain(params['_source'])?

forst · September 27, 2021, 8:09am

@stu
Debug.explain(params['_source']) show me empty

stu · September 28, 2021, 2:19pm

Is the document empty or is source disabled, {"_source": {"enabled": false} ...?

system · October 26, 2021, 2:19pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
REGEX-Painless returns null Kibana	8	842	May 18, 2018
Painless Scripted Field - TimeStamp - Regex Kibana	3	1085	July 30, 2018
Problem with regex scripted field kibana using painless lang! Elasticsearch painless	1	398	October 19, 2020
Scripted Field / Painless script fails upon slash for string match Kibana painless	2	1073	January 13, 2020
Java regex matcher in painless seems not to be working Elasticsearch painless	1	825	March 10, 2022

Regex not working with painless

Related topics