How to complete disable TF-IDF?


(Alex Sudakov) #1

Hello community, how to complete disable TF-IDF? and replace it by best match.
We use ES for music identification, thanks.


(Sergei Dauletau) #2

Take a look at function score query. You can override TF-IDF score by combining boost and weight.
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html

You can also implement custom similarity in a plugin.


(Alex Sudakov) #3

Hello Sergei, since ES 5.1 site plugin is disabled :disappointed:
Exist any other less painful way to do it?


(Sergei Dauletau) #4

As I said, take a look at function score query. It allows to replace internal score.

curl -s -XDELETE "http://localhost:9200/test_index"

curl -s -XPUT "http://localhost:9200/test_index" -d '
{
  "settings": {
    "index": {
      "number_of_shards": 1,
      "number_of_replicas": 0,
      "similarity": {
        "default": {
          "type": "classic"
        }
      }
    }
  }
}
'

curl -XPUT 'localhost:9200/test_index/test_type/_mapping' -d '
{
  "test_type": {
    "properties": {
      "field1": {
        "type": "text"
      },
      "field2": {
        "type": "text"
      }
    }
  }
}
'

curl -s -XPUT "localhost:9200/test_index/test_type/1" -d '
{"field1" : "bar foo", "field2" : "bar"}
'

curl -s -XPUT "localhost:9200/test_index/test_type/2" -d '
{"field1" : "bar bar bar", "field2" : "foo foo foo"}
'

curl -s -XPUT "localhost:9200/test_index/test_type/3" -d '
{"field1" : "bar bar foo foo", "field2" : "bar bar foo foo"}
'

curl -s -XPOST "http://localhost:9200/test_index/_refresh"

echo
echo
echo 'expecting doc 3 to have score 7.0'

curl -s "localhost:9200/test_index/test_type/_search?pretty=true" -d '
{
  "explain": false,
  "query": {
    "bool": {
      "disable_coord": true,
      "should": [
        {
          "function_score": {
            "boost": "1.0",
            "weight": "2.0",
            "boost_mode": "replace",
            "query": {
              "bool": {
                "filter": {
                  "match": {
                    "field1": {
                      "query": "foo"
                    }
                  }
                }
              }
            }
          }
        },
        {
          "function_score": {
            "boost": "1.0",
            "weight": "5.0",
            "boost_mode": "replace",
            "query": {
              "bool": {
                "filter": {
                  "match": {
                    "field2": {
                      "query": "foo"
                    }
                  }
                }
              }
            }
          }
        }
      ]
    }
  }
}
'

Score based on Term Frequency alone
(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.