Split and get part of a long string in painless

auato · January 27, 2022, 11:39am

Hi, I would like to get from the following examples of string a specific part (in bold) in painless script in Kibana:

APAC42_OM/VM:Virtual machine name=APAC42_ELETYPE_**CSBBU**_0
APAC42_OM/VM:Virtual machine name=APAC42_ELETYPE_**CSCDS**_VDU2_0
APAC42_OM/VPORT:Virtual machine name=APAC42_ELETYPE_**CSHHD**_26, Port name=FABRIC2

In Excel I use the following function trying to search the third underline and extract the next 5 characters :

=MID(A2;FIND("~";SUBSTITUTE(A2;"_";"~";3))+1;5)

How can I do it in painless ?
Thanks

aaron-nimocks · January 27, 2022, 12:10pm

This was done in Painless Lab. Basically split the string on _ character and return the text in the [3] position. If all your messages are formatted like xxx_xxx_xxx_NEEDTHIS_xxx it will work.

Parameters

{
  "message": "APAC42_OM/VM:Virtual machine name=APAC42_ELETYPE_CSBBU_0"
}

Script

def result = params.message.splitOnToken('_');

return result[3];

Output

CSBBU

auato · January 27, 2022, 12:39pm

Thank you aaron... but actually there are also other different formats to the three I exposed: another example is this string:

APAC42_ELETYPE/IELETYPE_SIP_LINK:Link ID=1, SIP peer ID=1

Can your statement run correctly?
I tried your script in Kibana but it seems not OK in my environment
I replaced message into the field name which contains my string (it is called 'measobjdn')

return result[3];
---Kibana returns a script exception. Below they are the error details.---

{
  "took": 71,
  "timed_out": false,
  "_shards": {
    "total": 2,
    "successful": 1,
    "skipped": 1,
    "failed": 1,
    "failures": [
      {
        "shard": 0,
        "index": "vc-ims-hourly-hua-cscf-v72-000009",
        "node": "xzRVAllWQe2NtDlYcQY9rg",
        "reason": {
          "type": "script_exception",
          "reason": "runtime error",
          "script_stack": [
            "result = params.measobjldn.splitOnToken('_');\r\n\r\n",
            "                          ^---- HERE"
          ],
          "script": "def result = params.measobjldn.splitOnToken('_'); ...",
          "lang": "painless",
          "position": {
            "offset": 30,
            "start": 4,
            "end": 53
          },
          "caused_by": {
            "type": "null_pointer_exception",
            "reason": "cannot access method/field [splitOnToken] from a null def reference"
          }
        }
      }
    ]
  },
  "hits": {
    "max_score": null,
    "hits": [ ]
  }
}

aaron-nimocks · January 27, 2022, 12:54pm

I did it in the Painless Lab. Depending on your version you might not have it.

What value would you want out from this string?

APAC42_ELETYPE/IELETYPE_SIP_LINK:Link ID=1, SIP peer ID=1

Would it be LINK: which is the next 5 characters after the 3rd _?

To get the first 5 characters though you can use substring like this.

def result = params.message.splitOnToken('_');

return result[3].substring(0, 5);

auato · January 31, 2022, 3:30pm

Mine should be v 7.15.0

Tomo_M · January 31, 2022, 4:19pm

This null_pointer_exception occured because params.measobjldn is null.
Please see here to get how to access field values by painless. I suppose you need params._source.measobjldn, if it is a text type field.

auato · February 1, 2022, 11:14am

Thanks @Tomo_M I'll read more accurately the guide you linked (only had a glance so far).
Meanwhile I immediately tried your solution, adding _source

def result = params._source.measobjldn splitOnToken('_');
return result[3].substring(0, 5);

but it still needs some improvements as the following message is returned:

"caused_by": {
            "type": "array_index_out_of_bounds_exception",
            "reason": "Index 3 out of bounds for length 1"

Tomo_M · February 1, 2022, 11:40am

typo.
must be measobjldn.splitOnToken('_');

aaron-nimocks · February 1, 2022, 12:01pm

Where you trying to run this? Some place you use params, or _source or ctx.

Can you post your entire code and maybe a screenshot to see where you are running this to just make it easy?

auato · February 1, 2022, 2:03pm

Sorry it was a typo only when I replied to you because the correct statement I used included the last dot

> def LU = params._source.measobjldn.splitOnToken('_');

auato · February 1, 2022, 2:15pm

I am trying to create a new scripted field (it is named LU) in Kibana 7.15.0. The field measobjldn is a text field (but there is also another one cloned as keyword):

The lines of code that I am trying are the following:

def LU = params._source.measobjldn.splitOnToken('_');
return LU[3].substring(0, 5);

and the response in Discovering mode of Kibana is below:

{
"took": 1219,
"timed_out": false,
"_shards": {
"total": 2,
"successful": 1,
"skipped": 0,
"failed": 1,
"failures": [
{
"shard": 0,
"index": "vc-ims-hourly-hua-cscf-v72-000010",
"node": "N9OFrmSzTMmhc08MZmKr1A",
"reason": {
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"return LU[3].substring(0, 5);",
" ^---- HERE"
],
"script": "def LU = params.source.measobjldn.splitOnToken(''); ...",
"lang": "painless",
"position": {
"offset": 64,
"start": 55,
"end": 84
},
"caused_by": {
"type": "array_index_out_of_bounds_exception",
"reason": "Index 3 out of bounds for length 1"
}
}
}
]
},
"hits": {
"max_score": null,
"hits":
}
}

aaron-nimocks · February 1, 2022, 2:36pm

For scripted fields you use doc['some_field'].value when referencing fields as you can see on the bottom of the scripted field screen. Also click the Get help with the syntax and preview the results of your script link and you can get more info as well as testing to see if the script is working.

With that said I would try this.

def LU = doc['measobjldn'].value.splitOnToken('_');
return LU[3].substring(0, 5);

auato · February 1, 2022, 3:05pm

def LU = doc['measobjldn.keyword'].value.splitOnToken('_')

returns

"caused_by": {
"type": "illegal_argument_exception",
"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [measobjldn] in order to load field data by uninverting the inverted index. Note that this can use significant memory."

Then I tried to switch to the other cloned field (measobjldn.keyword)
def LU = doc['measobjldn.keyword'].value.splitOnToken('_');

and it returns:

"caused_by": {
"type": "array_index_out_of_bounds_exception",
"reason": "Index 3 out of bounds for length 1"

Next I tried:
LU = params._source.doc['measobjldn.keyword'].value.splitOnToken('_');

which returns:

"caused_by": {
"type": "null_pointer_exception",
"reason": "cannot access method/field [normalizeIndex] from a null def reference"

aaron-nimocks · February 1, 2022, 3:20pm

It should be doc['measobjldn.keyword'].value.

Did you verify the index mappings has measobjldn as a keyword?

Try just return doc['measobjldn.keyword'].value;

auato · February 1, 2022, 8:53pm

if I try

def LU = params.doc['measobjldn.keyword'].value.splitOnToken('_');
return LU;

any errors disappear and the new scripted field LU is equal to measobjldn.keyword.
This measobjldn.keyword includes a large variety of string and even some of them without the character underline or only one or two or three characters underscores and so on are included. May be this variety generates a problem . Of course I would like to fetch only the 5 characters after the third underscore for those string that have this pattern and for the rest it's not important if the system returns a null or #NA or other errors/warnings

aaron-nimocks · February 1, 2022, 8:59pm

If return doc['measobjldn.keyword'].value; works and returns the value then I am not sure why the below is not working as a full solution.

def LU = doc['measobjldn.keyword'].value.splitOnToken('_');
return LU[3].substring(0, 5);

When you put that in and click the preview link at the bottom and go to the preview tab what do you have?

auato · February 2, 2022, 9:09am

with

def LU = doc['measobjldn'].value.splitOnToken('_');
return LU[3].substring(0, 5);

I get

"reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [measobjldn] in order to load field data by uninverting the inverted index. Note that this can use significant memory."

replacing measobjldn (which is formatted as text) with measobjldn.keyword (which is formatted as keyword) I get

"reason": "Index 3 out of bounds for length 1"

It looks like an error depending on the strings where there are not a third '_' in between.
That's why if I change '-' with '/' which is a character always present in the measobjldn strings, the script runs correctly and I can see the first 5 chars after '/' in the preview results.

def LU = doc['measobjldn.keyword'].value.splitOnToken('/');
return LU[1].substring(0, 5);

may be I need to include an IF statement to evaluate if a third '_' is there and if so to proceed with splitOnToken()

aaron-nimocks · February 2, 2022, 10:36am

That's correct. If your data doesn't always have the string in the format you are trying to evaluate then you need to some if statements to ensure the data meets the criteria for the split or it will fail.

system · March 2, 2022, 10:37am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Split string in scripted fields, using _ as a delimiter Kibana	7	15191	December 14, 2020
Unable to split Strings in Painless Elasticsearch painless	4	2483	March 27, 2019
Split string Kibana	2	1052	August 5, 2019
Create Scripted Field using substring before "." Kibana	11	8032	July 23, 2020
Runtime field painless script to extract from a field Kibana painless	2	283	April 12, 2023

Split and get part of a long string in painless

Related topics