Customizing relevant scoring in Elasticsearch


(ashit pupu) #1

I have been going through the docs of Elasticsearch, and was curious about the relevance scoring fundamentals used in Elasticsearch. So in ES basically three factors—term frequency, inverse document frequency, and field-length norm—are used to calculate the relevant score of a particular document.

Now given the condition I don't want my result to be influenced by term frequency and field-length. How can I achieve it, I read it somewhere that for that you need to provide "not_analyzed" analyzer to the field. But doing that will defeat lot of my functionalities, so the question here arises how to prevent my result being influenced by term frequency and field-length with still using my custom analyzer or other analyzers.

Thanks
Ashit


(Mark Harwood) #2

Analyzed/not_analyzed controls what terms get put into the search index, not how query-time ranking works. Arguably it removes term frequency from the equation because there is only one term but that would mean this whole paragraph for example would be indexed as one large term rather than several word-terms.

To disable TF-IDF from your queries try wrapping then in a constant_score query [1].
To boost using properties of your documents use a function_score query [2]

[1] https://www.elastic.co/guide/en/elasticsearch/reference/2.4/query-dsl-constant-score-query.html
[2] https://www.elastic.co/guide/en/elasticsearch/reference/2.4/query-dsl-function-score-query.html


(system) #3