Turn a tokenized string into a list of tokens?


I am analysing texts with Elastic Search + Kibana 6.2:

I have a text field (say "mycontent") that I'd need to turn into a list of the tokens (not just the tokenized string) such that in Kibana I can get a count of the various tokens for all the texts in my search result. (If I do the same for the original text field, all the texts have a count of "1" given that that there are no identical texts). What I need instead is to handle the mycontent such that Kibana can show me the counts for (or percentages of documents containing) each of the tokens in mycontent.

How can I define an analyzer that does the normal tokenization (easy) and then at the end turns the tokenized string in to a list of the tokens?

I found the concept of a pipeline (and there is a split pipeline), that sounded relevant, but the pipeline (if I am not mistaken) is always a preprocess. What I'd need is to have the split take place AFTER tokenization (and potentially synonym mapping.)

I am sure this must be feasible but have been unable to find the precise wayto express that.

(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.