I have created simple similarity which all it does is returning doc.freq
.
{
"similarity": {
"custom_similarity_score": {
"type": "scripted",
"script": {
"source": "return doc.freq;"
}
}
}
}
There are also +- 500k documents in index foo-bar
with structure (Most of them contains term test
):
{
"mappings": {
"properties": {
"name": {
"type": "keyword",
"fields": {
"my": {
"type": "text",
"similarity": "simple_similarity"
},
"bm25": {
"type": "text"
}
}
}
}
}
}
And the query I am using is, eg.:
{
"query": {
"bool": {
"should": {
"match": {
"name.my": "Test Foo"
}
}
}
}
}
The problem is performance.
For standard BM25
similarity algorithm, query time takes about up to 10ms (which I test by replacing a query part name.my
with name.bm25
).
However for my simple_similarity
algorithm, query time takes about 50ms which is weird because it is much simpler than BM25
. It does not even have any math operations.
What is more...
Profile
API shows that score_count
for my simple similarity script of term test
equals to 501 232, which is the same as term.docFreq
(The number of documents that contain the current term in the index.) However, score_count
for BM25
equals to 10201.
Similar difference is for advance
in a profile
api.
Does anybody have any idea why the difference is so huge?