Split and get part of a long string in painless

Hi, I would like to get from the following examples of string a specific part (in bold) in painless script in Kibana:

APAC42_OM/VM:Virtual machine name=APAC42_ELETYPE_**CSBBU**_0
APAC42_OM/VM:Virtual machine name=APAC42_ELETYPE_**CSCDS**_VDU2_0
APAC42_OM/VPORT:Virtual machine name=APAC42_ELETYPE_**CSHHD**_26, Port name=FABRIC2

In Excel I use the following function trying to search the third underline and extract the next 5 characters :

=MID(A2;FIND("~";SUBSTITUTE(A2;"_";"~";3))+1;5)

How can I do it in painless ?
Thanks

This was done in Painless Lab. Basically split the string on _ character and return the text in the [3] position. If all your messages are formatted like xxx_xxx_xxx_NEEDTHIS_xxx it will work.

Parameters

{
  "message": "APAC42_OM/VM:Virtual machine name=APAC42_ELETYPE_CSBBU_0"
}

Script

def result = params.message.splitOnToken('_');

return result[3];

Output

CSBBU

Thank you aaron... but actually there are also other different formats to the three I exposed: another example is this string:

APAC42_ELETYPE/IELETYPE_SIP_LINK:Link ID=1, SIP peer ID=1

Can your statement run correctly?
I tried your script in Kibana but it seems not OK in my environment
I replaced message into the field name which contains my string (it is called 'measobjdn')

return result[3];
---Kibana returns a script exception. Below they are the error details.---

{
  "took": 71,
  "timed_out": false,
  "_shards": {
    "total": 2,
    "successful": 1,
    "skipped": 1,
    "failed": 1,
    "failures": [
      {
        "shard": 0,
        "index": "vc-ims-hourly-hua-cscf-v72-000009",
        "node": "xzRVAllWQe2NtDlYcQY9rg",
        "reason": {
          "type": "script_exception",
          "reason": "runtime error",
          "script_stack": [
            "result = params.measobjldn.splitOnToken('_');\r\n\r\n",
            "                          ^---- HERE"
          ],
          "script": "def result = params.measobjldn.splitOnToken('_'); ...",
          "lang": "painless",
          "position": {
            "offset": 30,
            "start": 4,
            "end": 53
          },
          "caused_by": {
            "type": "null_pointer_exception",
            "reason": "cannot access method/field [splitOnToken] from a null def reference"
          }
        }
      }
    ]
  },
  "hits": {
    "max_score": null,
    "hits": [ ]
  }
}

I did it in the Painless Lab. Depending on your version you might not have it.

What value would you want out from this string?

APAC42_ELETYPE/IELETYPE_SIP_LINK:Link ID=1, SIP peer ID=1

Would it be LINK: which is the next 5 characters after the 3rd _?

To get the first 5 characters though you can use substring like this.

def result = params.message.splitOnToken('_');

return result[3].substring(0, 5);

Mine should be v 7.15.0

This null_pointer_exception occured because params.measobjldn is null.
Please see here to get how to access field values by painless. I suppose you need params._source.measobjldn, if it is a text type field.

Thanks @Tomo_M I'll read more accurately the guide you linked (only had a glance so far).
Meanwhile I immediately tried your solution, adding _source

def result = params._source.measobjldn splitOnToken('_');
return result[3].substring(0, 5);

but it still needs some improvements as the following message is returned:

"caused_by": {
            "type": "array_index_out_of_bounds_exception",
            "reason": "Index 3 out of bounds for length 1"

typo.
must be measobjldn.splitOnToken('_');

Where you trying to run this? Some place you use params, or _source or ctx.

Can you post your entire code and maybe a screenshot to see where you are running this to just make it easy?

1 Like

Sorry it was a typo only when I replied to you because the correct statement I used included the last dot

> def LU = params._source.measobjldn.splitOnToken('_');

I am trying to create a new scripted field (it is named LU) in Kibana 7.15.0. The field measobjldn is a text field (but there is also another one cloned as keyword):

The lines of code that I am trying are the following:

def LU = params._source.measobjldn.splitOnToken('_');
return LU[3].substring(0, 5);

and the response in Discovering mode of Kibana is below:

{
"took": 1219,
"timed_out": false,
"_shards": {
"total": 2,
"successful": 1,
"skipped": 0,
"failed": 1,
"failures": [
{
"shard": 0,
"index": "vc-ims-hourly-hua-cscf-v72-000010",
"node": "N9OFrmSzTMmhc08MZmKr1A",
"reason": {
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"return LU[3].substring(0, 5);",
" ^---- HERE"
],
"script": "def LU = params.source.measobjldn.splitOnToken(''); ...",
"lang": "painless",
"position": {
"offset": 64,
"start": 55,
"end": 84
},
"caused_by": {
"type": "array_index_out_of_bounds_exception",
"reason": "Index 3 out of bounds for length 1"
}
}
}
]
},
"hits": {
"max_score": null,
"hits":
}
}

For scripted fields you use doc['some_field'].value when referencing fields as you can see on the bottom of the scripted field screen. Also click the Get help with the syntax and preview the results of your script link and you can get more info as well as testing to see if the script is working.

With that said I would try this.

def LU = doc['measobjldn'].value.splitOnToken('_');
return LU[3].substring(0, 5);

def LU = doc['measobjldn.keyword'].value.splitOnToken('_')

returns

"caused_by": {
"type": "illegal_argument_exception",
"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [measobjldn] in order to load field data by uninverting the inverted index. Note that this can use significant memory."

Then I tried to switch to the other cloned field (measobjldn.keyword)
def LU = doc['measobjldn.keyword'].value.splitOnToken('_');

and it returns:

"caused_by": {
"type": "array_index_out_of_bounds_exception",
"reason": "Index 3 out of bounds for length 1"

Next I tried:
LU = params._source.doc['measobjldn.keyword'].value.splitOnToken('_');

which returns:

"caused_by": {
"type": "null_pointer_exception",
"reason": "cannot access method/field [normalizeIndex] from a null def reference"

It should be doc['measobjldn.keyword'].value.

Did you verify the index mappings has measobjldn as a keyword?

Try just return doc['measobjldn.keyword'].value;

if I try

def LU = params.doc['measobjldn.keyword'].value.splitOnToken('_');
return LU;

any errors disappear and the new scripted field LU is equal to measobjldn.keyword.
This measobjldn.keyword includes a large variety of string and even some of them without the character underline or only one or two or three characters underscores and so on are included. May be this variety generates a problem . Of course I would like to fetch only the 5 characters after the third underscore for those string that have this pattern and for the rest it's not important if the system returns a null or #NA or other errors/warnings

If return doc['measobjldn.keyword'].value; works and returns the value then I am not sure why the below is not working as a full solution.

def LU = doc['measobjldn.keyword'].value.splitOnToken('_');
return LU[3].substring(0, 5);

When you put that in and click the preview link at the bottom and go to the preview tab what do you have?

with

def LU = doc['measobjldn'].value.splitOnToken('_');
return LU[3].substring(0, 5);

I get

"reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [measobjldn] in order to load field data by uninverting the inverted index. Note that this can use significant memory."

replacing measobjldn (which is formatted as text) with measobjldn.keyword (which is formatted as keyword) I get

"reason": "Index 3 out of bounds for length 1"

It looks like an error depending on the strings where there are not a third '_' in between.
That's why if I change '-' with '/' which is a character always present in the measobjldn strings, the script runs correctly and I can see the first 5 chars after '/' in the preview results.

def LU = doc['measobjldn.keyword'].value.splitOnToken('/');
return LU[1].substring(0, 5);

may be I need to include an IF statement to evaluate if a third '_' is there and if so to proceed with splitOnToken()

That's correct. If your data doesn't always have the string in the format you are trying to evaluate then you need to some if statements to ensure the data meets the criteria for the split or it will fail.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.