### Settings for Indexing ###
import requests
import json
import logging
settings = {
'settings': {
'index': {
'number_of_shards': 1,
'number_of_replicas': 1,
'similarity': {
'default': {
'type': 'BM25',
"b": 0.3,
"k1": 0
}
}
}
},
'mappings': {
'properties': {
'title': {
'type': 'text',
}
}
}
}
headers = {'Content-Type': 'application/json'}
response = requests.put('http://localhost:9200/alldocs', data=json.dumps(settings), headers=headers)
response.json()
I am using the above elastic index setting for my search. I am using the BM25 scoring measure here.
Apparently, when I search for the top 20 results, the scores are not sorted. Furthermore, I also see that certain documents that are not in my top-20 results, by means of random sampling, have a better BM25 score (used a different BM25 library).
Can anyone help me figure why is this behavior and how can I resolve this? (Elasticsearch documentation says it sorts all the scores by default)
Could it be because of sharding? But then I have explicitly asked the engine to use a single shard here.