Say I have a field called "webpages" with an array of webpage plain-text content.
Through Elasticsearch, is there a way I can have it output the top 1 million 6 word long n-gram phrases based on term frequency? Maybe the top 3 million 2 word long n gram phrases?
By frequency, I'm referring to the number of times the n-gram appears in the "webpages" array across the whole index.
I'm thinking maybe ES already computes this information in advance for tf-idf computations, would be useful if I could output it and save it to a text file in a reasonable amount of time.
This would really save me time because all relevant data is already stored in one centralized place - Elasticsearch. Thanks in advance!