Calculation of document frequency for cutoff_frequency


I want to be able to see exactly which terms are considered high frequency
terms at a specific cutoff_frequency.
I noticed that if I query the termvectors of different documents with
different routing values, the values of field_statistics[doc_count] and
the term[doc_freq] change.

That led me to the following questions:
How is the document frequency calculated that is used for the
cutoff_frequency feature?
Is the document frequency of terms calculated for the shard of its
document, all shards of the index or all queried shards (routing) or which
Is the document frequency used in cutoff_frequency calculated like
term[doc_freq] / field_statistics[doc_count], or differently?

How can I find the terms, that are high frequency terms at a given
Is there a facet or aggregation query, that i could use?

Many thanks

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
To view this discussion on the web visit
For more options, visit