Turn a tokenized string into a list of tokens?

I am analysing texts with Elastic Search + Kibana 6.2:

I have a text field (say "mycontent") that I'd need to turn into a list of the tokens (not just the tokenized string) such that in Kibana I can get a count of the various tokens for all the texts in my search result. (If I do the same for the original text field, all the texts have a count of "1" given that that there are no identical texts). What I need instead is to handle the mycontent such that Kibana can show me the counts for (or percentages of documents containing) each of the tokens in mycontent.

How can I define an analyzer that does the normal tokenization (easy) and then at the end turns the tokenized string in to a list of the tokens?

I found the concept of a pipeline (and there is a split pipeline), that sounded relevant, but the pipeline (if I am not mistaken) is always a preprocess. What I'd need is to have the split take place AFTER tokenization (and potentially synonym mapping.)

I am sure this must be feasible but have been unable to find the precise wayto express that.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.