I have a use case where I want to find the score of each word in my artificial document when compared against existing corpus.
For that I am using termvector api on an artificial document.
The api fails if we use "filter" AND "there are words in the artificial document that doesn't exist in the index"
ex:
{
"doc" : {
"PROB_DESC": " My Name is Murari Tikmani "
},
"fields":["PROB_DESC"],
"term_statistics": true,
"field_statistics": true,
"positions": false,
"offsets": false,
"filter" : {
"min_term_freq": 0,
"min_doc_freq": 2
}
}
The above returns error. If I remove the "filter" from the above, I get term statistics in which I see that for Murari {tf=1} and there is no df mentioned.
filter is a must for me because only after i provide the filter, I can get the score