I have a requirement to do find out TOP 10 keywords from text data ( short strings only... limited to 5 to 6 keywords).
in order to do aggregate the tokens from text data -
There are two approach I think of :
enable fielddata to true which is set to false by default on text field. What is actual side effect of it when we have millions of logs entry in one index?
Versus , another approach was thinking to get the keywords after applying tokenization outside ES layer and store these tokenized words in ES as datatype "keyword".
Wanted to know if anybody has faced similar challenge earlier and how to go about it if there are any recommendation?
Any tools to validate / measure the JVM memory usage by approach #1 ?