Turn a tokenized string into a list of tokens?

stefan3 · March 9, 2018, 3:26pm

I am analysing texts with Elastic Search + Kibana 6.2:

I have a text field (say "mycontent") that I'd need to turn into a list of the tokens (not just the tokenized string) such that in Kibana I can get a count of the various tokens for all the texts in my search result. (If I do the same for the original text field, all the texts have a count of "1" given that that there are no identical texts). What I need instead is to handle the mycontent such that Kibana can show me the counts for (or percentages of documents containing) each of the tokens in mycontent.

How can I define an analyzer that does the normal tokenization (easy) and then at the end turns the tokenized string in to a list of the tokens?

I found the concept of a pipeline (and there is a split pipeline), that sounded relevant, but the pipeline (if I am not mistaken) is always a preprocess. What I'd need is to have the split take place AFTER tokenization (and potentially synonym mapping.)

I am sure this must be feasible but have been unable to find the precise wayto express that.

system · April 6, 2018, 3:26pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Index pre-analyzed text by sending the actual terms/tokens? Elasticsearch	6	724	December 10, 2020
Unique Count with Tokenizer Kibana	2	748	July 6, 2017
Looking for a phrase tokenizer or filter like this Elastic Search	4	234	November 2, 2022
Is it possible to count each term of a text instead of the complete text to display in Kibana? Elasticsearch	5	4790	March 3, 2017
Aggregation of Regex of Terms Kibana	2	1745	July 6, 2017

Turn a tokenized string into a list of tokens?

Related topics