Hi there,
I am (re)asking this question, as it has been already asked by other users but no response is there yet ...
- https://discuss.elastic.co/t/document-frequency-of-phrases/21104
- https://discuss.elastic.co/t/calculation-of-whymatch-in-elasticsearch/15658/6
- https://discuss.elastic.co/t/score-based-on-phrase-frequency-only/17105
The question is how to get the number of times a phrase is appeared in a specific document and in the whole collection? Here is an example:
Consider the following documents indexed by elasticsearch,
doc1: "one two three one two"
doc2: "three one two four"
I would like to get the following stats from the index:
phrase_frequency(doc1, "one two") = 2
phrase_frequency(doc2, "one two") = 1
collection_frequency("one two") = 3
I know that it is has to be done with the "span near queries", but could not find a way to get these stats.
Could someone please provide some help in this regard?
Thanks!