Phrase frequency in a document and in the whole collection

It is for machine consumption; we need these stats to develop our own scoring model.

These stats can be obtained in Lucene by "span near queries" (even though it is not through a very elegant way); I expect to get them in elasticsearch as well.