Elasticsearch simple scripted similarity performance issues

I have created simple similarity which all it does is returning doc.freq .

{
  "similarity": {
    "custom_similarity_score": {
      "type": "scripted",
      "script": {
        "source": "return doc.freq;"
      }
    }
  }
}

There are also +- 500k documents in index foo-bar with structure (Most of them contains term test):

{
  "mappings": {
    "properties": {
      "name": {
        "type": "keyword",
        "fields": {
          "my": {
            "type": "text",
            "similarity": "simple_similarity"
          },
          "bm25": {
            "type": "text"
          }
        }
      }
    }
  }
}

And the query I am using is, eg.:

{
    "query": {
        "bool": {
            "should": {
                "match": {
                    "name.my": "Test Foo"
                }
            }
        }
    }
}

The problem is performance.

For standard BM25 similarity algorithm, query time takes about up to 10ms (which I test by replacing a query part name.my with name.bm25).
However for my simple_similarity algorithm, query time takes about 50ms which is weird because it is much simpler than BM25 . It does not even have any math operations.

What is more...

Profile API shows that score_count for my simple similarity script of term test equals to 501 232, which is the same as term.docFreq (The number of documents that contain the current term in the index.) However, score_count for BM25 equals to 10201.

Similar difference is for advance in a profile api.

Does anybody have any idea why the difference is so huge?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.