Painless: string to word list

I am trying to break a string field into a list of words in order to do word occurrence count, but am having some trouble figuring how to write it properly. This is difficult especially since there is no way for me to save my script in-progress and if I leave an incorrect painless script in Kibana, nothing will work.

Anyway, this is what I have right now:

def text = doc['content.keyword'].value;

ArrayList words = new ArrayList();
BreakIterator breakIterator = BreakIterator.getWordInstance();
breakIterator.setText(text);
int lastIndex = breakIterator.first();
while (BreakIterator.DONE != lastIndex) {
    int firstIndex = lastIndex;
    lastIndex = breakIterator.next();
    if (lastIndex != BreakIterator.DONE && Character.isLetterOrDigit(text.charAt(firstIndex))) {
        words.add(text.substring(firstIndex, lastIndex));
    }
}

return words;

I recognize that this probably failed because words is an ArrayList and the field type can only be string / int / boolean. How might I do this?

Without using regexes, this seems like a valid approach. Is it possible to provide the errors you are seeing when you attempt to do it this way?

Edit: The words ArrayList can handle any type as all items added to it are considered of type 'def' in Painless, so I don't believe that is the issue here.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.