Split string in scripted fields, using _ as a delimiter

I have a string field "myfield.keyword", where each entry has the following format:

AAA_BBBB_CC
DDD_EEE_F

I am trying to create a scripted field that outputs the substring before the first _, a scripted field that outputs the substring between the first and second _ and a scripted field that outputs the substring after the second _

So far, for the first scripted field, I have:

def path = doc['myfield.keyword'].value;
if (path != null){
     return path.substring(0,3);
}

This outputs the result I want, because the subtring before the first _ always has three characters. However, I cannot generalise this approach to the other scripted fields, because the lenghts of the relevant substrings varies between different entries. I suppose that the strategy should be using "_" as a delimiter for the different outputs, but I haven't been able to find a suitable solution.

You can use path.indexOf('_') to get the position of the first underscore character.

However path.split('_') might be even easier - it will split up your string into an array with 3 items - then you can return the one you need.

1 Like

Thanks Joe!
I am trying to use path.split('_') as you suggested but haven't been able to get it right. My idea is to use this same approach for the different elements of the array. For instance:

def client = "";
def path = doc[''myfield.keyword].value;
if (...)
{client = path.split('_')[1];} else {client="none";}
return client

However, I keep getting the following error:

"lang": "painless", "caused_by": {"type":"illegal_argument_exception","reason":"dynamic method [java.lang.String, split/1] not found}"}.

I can get a script that works with path.indexOf('_'):

def client = "";
def path = doc[''myfield.keyword].value;
if (...)
{client = path.indexOf('_');} else {client="none";}
return client

When the if condition is satisfied, the output of client is 3, as expected. However, this is not general enough and I still haven't been able to make a script with path.split('_'). Any ideas? Idealy, I would need to create an array as a result of spliting a string, with _ as the delimiter and then call the diferent elements of the array.

I led you into a rabbit hole there a little, sorry. It seems like String.split is not available in painless (more context here: https://github.com/elastic/elasticsearch/issues/26338)

But there's a workaround (also described in the issue). Can you try /_/.split(path) instead?

No worries :slight_smile:
The workaround didn't work for me. I get an error of type "illegal_state_exception" and reason: "Regexes are disabled..." I am wondering if there's a way to split a string that does not involve enabling regexes (since that's not an option for me) :thinking:

In case this might be useful for someone else, I finally managed to solve the problem by employing path.indexOf('_') and a few more things. This solution can be adapated for any number of splits. That is, in my case, I had to split a string into 5 different parts, given four occurrences of _. This solution can be adaptad for any other number of parts.

To create newfield_one:

def newfield_one = ""; 
def path = doc["myfield.keyword"].value;
def first = path.indexOf("_");
if ()
{newfield_one = path.substring(0,first);}
else{newfield_one = 0;}
return newfield_one

For newfield_two:

def newfield_two = ""; 
def path = doc["myfield.keyword"].value;
def first = path.indexOf('_');
def second = first+1 + path.substring(first + 1).indexOf('_);
if (...)
{newfield_two = path.substring(first +1, second);}
else{newfield_two = 0}:
return new field

For the last, newfield_last:

def newfield_last = "";
def path = doc["myfield.keyword"].value;
def last = path.lastIndexOf('_');
if(...)
{newfield_last = path.substring(last+1);}
else{newfield_last = 0;}
return newfield_last

I am grateful for Val's help, here.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.