I've been looking for a way to extract n-gram frequencies from
ElasticSearch as though it were a large table of n-grams by documents. I
found this thread from about a year ago:
"3. The above, 1 and 2, talk about having map reduce implemented on the
"search" aspect. One thing that I would love to also tackle is the "terms"
aspect of a search engine. Being able to run (streaming) map reduce jobs on
terms, especially ones with term vector information, can provide a strong
infrastructure for implementing algos like clustering and the like.
So, yes, it has crossed my mind :), and it is on the roadmap."
I'm wondering what the status of this is today. Is something similar
supported in a different way? I could begin work on a plugin or I could
help with a module in development.