Elasticsearch simple scripted similarity performance issues

kamils468 · July 7, 2020, 9:30am

I have created simple similarity which all it does is returning doc.freq .

{
  "similarity": {
    "custom_similarity_score": {
      "type": "scripted",
      "script": {
        "source": "return doc.freq;"
      }
    }
  }
}

There are also +- 500k documents in index foo-bar with structure (Most of them contains term test):

{
  "mappings": {
    "properties": {
      "name": {
        "type": "keyword",
        "fields": {
          "my": {
            "type": "text",
            "similarity": "simple_similarity"
          },
          "bm25": {
            "type": "text"
          }
        }
      }
    }
  }
}

And the query I am using is, eg.:

{
    "query": {
        "bool": {
            "should": {
                "match": {
                    "name.my": "Test Foo"
                }
            }
        }
    }
}

The problem is performance.

For standard BM25 similarity algorithm, query time takes about up to 10ms (which I test by replacing a query part name.my with name.bm25).
However for my simple_similarity algorithm, query time takes about 50ms which is weird because it is much simpler than BM25 . It does not even have any math operations.

What is more...

Profile API shows that score_count for my simple similarity script of term test equals to 501 232, which is the same as term.docFreq (The number of documents that contain the current term in the index.) However, score_count for BM25 equals to 10201.

Similar difference is for advance in a profile api.

Does anybody have any idea why the difference is so huge?

system · August 4, 2020, 9:30am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Custom similarity is really slow, even when very simple (using doc.freq) Elasticsearch	2	297	June 28, 2021
How to calculate score by dividing number of occurences per field length? Elasticsearch painless	5	1318	January 17, 2020
Match_phrase score calculation Elasticsearch	5	1406	April 8, 2019
Simple Search Query Performance Elasticsearch	3	481	July 5, 2017
Scripted Similarity performance Elasticsearch	3	1078	April 6, 2018

Elasticsearch simple scripted similarity performance issues

Related topics