How to split a field value into separated fields in elasticsearch

Hi

I have the following issue that I hope to get some help to resolve

background:

. I ingest a log file using filebeat
. I defined inside elasticsearch grok and kv statements to split incoming data into separated fields

Question:
. If I have field that II want to further split down to different field, how can I do it?
. Is there a way to apply a regular expression to a field to determine a match and split this field into different values?
. Can I assign the new split values different fields?

example:

I have a field --
navlog.context.filename : https://xxx.yyy.com/NA/GEN4/LANDMARK/version.properties

I want to split the above field into:

navlog.context.region: NA
navlog.context.product:GEN4
navlog.context.layer:LANDMARK
navlog.context.filename:version.properties

Thank you in advance for your help.

Best Regards

Hung Le

Hi Hung_M_Le,

Have you considered using grok again on your newly generated fields ? You could also split by "/", rename fields you want to keep and drop the others but I don't see why you would do this if grok is usable.

Regards,
S0ul

@Hung_M_Le
I would use script processor to avoid running regex multiple times. For ex.

PUT _ingest/pipeline/filename_splitter
{
  "processors": [
    {
      "script": {
        "lang": "painless", 
        "source": """
          String fn = ctx['navlog.context.filename'];
          int loc = fn.indexOf('/', 'https://'.length()); 
          int loc2 = fn.indexOf('/', loc+1);
          if (loc2 > -1) {
            ctx['navlog.context.region'] = fn.substring(loc+1, loc2);
            loc = loc2;
            loc2 = fn.indexOf('/', loc+1);
              if (loc2 > -1) {
                ctx['navlog.context.product'] = fn.substring(loc+1, loc2);
              }
          }
          """
      }
    }
  ]
}

POST navlog/_doc?pipeline=filename_splitter
{
  "navlog.context.filename" : "https://xxx.yyy.com/NA/GEN4/LANDMARK/version.properties"
}

The split processor might be another way to go.

Split processor generates array. We need a dictionary. String parts are set to different fields.

you can just pick the array elements then and set them to fields manually using a script processor

THank you Vinayak and Alexander for your recommendation. I have tried both "script" and "split" and I found some issues when the format of a field change; however, I found a way to parse the field using grok. Here is a grok syntax that I used and it works pretty good. I also like the fact the I can use the grok debugger to test out the grok pattern.

{
"grok": {
"if": "ctx.navlog?.message != null && ctx.navlog?.message =~ /^T\|/",
"field": "navlog.context.filename",
"patterns":["https://%{DATA:navlog.context.web_server}/%{DATA:navlog.context.region}/%{DATA:navlog.context.project}/%{DATA:navlog.context.layer}/%{DATA:navlog.context.map_level}/%{DATA:navlog.context.map_sublevel}/%{DATA:navlog.context.tiles}/%{DATA:navlog.context.tile_id}/%{DATA:navlog.context.filename}","https://%{DATA:navlog.context.web_server}/%{DATA:navlog.context.region}/%{DATA:navlog.context.project}/%{DATA:navlog.context.layer}/%{DATA:navlog.context.filename}"]
}
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.