This was done in Painless Lab. Basically split the string on _ character and return the text in the [3] position. If all your messages are formatted like xxx_xxx_xxx_NEEDTHIS_xxx it will work.
Can your statement run correctly?
I tried your script in Kibana but it seems not OK in my environment
I replaced message into the field name which contains my string (it is called 'measobjdn')
This null_pointer_exception occured because params.measobjldn is null.
Please see here to get how to access field values by painless. I suppose you need params._source.measobjldn, if it is a text type field.
I am trying to create a new scripted field (it is named LU) in Kibana 7.15.0. The field measobjldn is a text field (but there is also another one cloned as keyword):
For scripted fields you use doc['some_field'].value when referencing fields as you can see on the bottom of the scripted field screen. Also click the Get help with the syntax and preview the results of your script link and you can get more info as well as testing to see if the script is working.
With that said I would try this.
def LU = doc['measobjldn'].value.splitOnToken('_');
return LU[3].substring(0, 5);
def LU = doc['measobjldn.keyword'].value.splitOnToken('_')
returns
"caused_by": {
"type": "illegal_argument_exception",
"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [measobjldn] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
Then I tried to switch to the other cloned field (measobjldn.keyword) def LU = doc['measobjldn.keyword'].value.splitOnToken('_');
and it returns:
"caused_by": {
"type": "array_index_out_of_bounds_exception",
"reason": "Index 3 out of bounds for length 1"
Next I tried: LU = params._source.doc['measobjldn.keyword'].value.splitOnToken('_');
which returns:
"caused_by": {
"type": "null_pointer_exception",
"reason": "cannot access method/field [normalizeIndex] from a null def reference"
def LU = params.doc['measobjldn.keyword'].value.splitOnToken('_');
return LU;
any errors disappear and the new scripted field LU is equal to measobjldn.keyword.
This measobjldn.keyword includes a large variety of string and even some of them without the character underline or only one or two or three characters underscores and so on are included. May be this variety generates a problem . Of course I would like to fetch only the 5 characters after the third underscore for those string that have this pattern and for the rest it's not important if the system returns a null or #NA or other errors/warnings
def LU = doc['measobjldn'].value.splitOnToken('_');
return LU[3].substring(0, 5);
I get
"reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [measobjldn] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
replacing measobjldn (which is formatted as text) with measobjldn.keyword (which is formatted as keyword) I get
"reason": "Index 3 out of bounds for length 1"
It looks like an error depending on the strings where there are not a third '_' in between.
That's why if I change '-' with '/' which is a character always present in the measobjldn strings, the script runs correctly and I can see the first 5 chars after '/' in the preview results.
def LU = doc['measobjldn.keyword'].value.splitOnToken('/');
return LU[1].substring(0, 5);
may be I need to include an IF statement to evaluate if a third '_' is there and if so to proceed with splitOnToken()
That's correct. If your data doesn't always have the string in the format you are trying to evaluate then you need to some if statements to ensure the data meets the criteria for the split or it will fail.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.