Scripted fields - matcher vs =~ different results

Hi everyone,

Running ElasticSearch 6.3.0, Kibana and fluentd. We are trying to extract some information from a field called log:

"log": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }

The log entries have the following format:

2019-02-19 23:20:36.633 [ipAddress:10.0.0.1 | method:GET | requestURI:/uri | 
userId:anonymousUser | requestId:1235abcd] INFO 1 --- [testing] LoggingFilter : 
Request duration (milliseconds): 470

Using painless, we have been trying to create a scripted field that extracts the request duration figure, however, we have not been able to come up with a solution so far.

After reading the docs, we tried accessing doc['log.keyword'].value, but surprisingly, this variable seems to be always null. Then we started using params._source.log and it kind works, in the sense that the log field seems to be in this variable, but when we try to filter it we get funny results. Here's a test painless script I did, as I was always getting a 'No match' using matcher.

def time = /^.*\(milliseconds\): ([0-9]+)$/.matcher(params._source.log);
def time2 = params._source.log =~ /^.*\(milliseconds\): ([0-9]+)$/;

return time.matches() + " vs " + time2;
// This returns false vs true for the same regexp

For the sake of testing, I tried passing the whole string as an argument and interestingly, this time it returns "true vs true":

def time = /^.*\(milliseconds\): ([0-9]+).*$/.matcher('2019-02-19 23:20:36.633 [ipAddress:10.0.0.1 | method:GET | requestURI:/uri | userId:anonymousUser | requestId:1235abcd] INFO 1 --- [testing] LoggingFilter : Request duration (milliseconds): 470');
def time2 = '2019-02-19 23:20:36.633 [ipAddress:10.0.0.1 | method:GET | requestURI:/uri | userId:anonymousUser | requestId:1235abcd] INFO 1 --- [testing] LoggingFilter : Request duration (milliseconds): 470' =~ /^.*\(milliseconds\): ([0-9]+).*$/;

return time.matches() + " vs " + time2;

Any advice on how to do this? Is the field type a problem for what we are trying to do? Why is doc['log.keyword'] always empty? Is params._source the right way of accessing the log field?

Thanks!

1 Like

I think you're going to want doc['log'].value (without "keyword"). Could you give that a try?

Actually ignore that last comment, I'm looking more into this.

I'm not sure why you'd be getting null. I set up a simple mapping with a single doc and it seemed to work fine. You might have better luck here in the Elasticsearch Painless forums.

Hi Lukas,

After reading your post, just changed the painless to be:

def match = "log is Null";

if ( doc['log.keyword'].value != null )  {
  match = doc['log.keyword'].value =~ /^.*\(milliseconds\): ([0-9]+)$/;
}

return match + " & " + doc['log.keyword'].value

And always returns "log is Null & null" as attached in the picture.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.