Need a term frequency Report across the entire Index

I would like to get a term frequency report for a relatively large index.

This is the background of what I am trying to do. I have formulated something called a grouping which is nothing but result sets. Say my index is having a Million documents, these result set grouping would be something like 4000 or 5000 in size. Within this result set, I would like to mine the interesting keywords, perhaps create a report out of it to analyse.

I am still in the exploration phase, so I would like to see the most commonly used terms and its frequency (TTF) for not just a single word, but for 1, 2, 3 words appearing in a sequence. An example I could cite for a 3-word is "Advanced Encryption Standards". There is a very high probability for me to encounter noise for 1-word items, but my assumption is that I could ignore them by defining stopwords.

I went through Term Vectors, but that is something not what I want, as it focusses on a single document, but not on a result set (or the entire index). Plus I don't have any input keywords here as my objective is to figure them out.

I have experience with SOLR and ES and this problem I am encountering is relatively new. I went through various documents, but I could not narrow down (May be I did not spend enough time!). Can someone please point me to the right place to look at for this problem?

Any pointers is greatly appreciated!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.