I am running an index with bm25 similarity over a field named title.
When I (manually) calculate the teh average field length I find ~10.2However when I use elastic search explain I get much smaller. This ruins the score expected calculation
Any clue what goes wrong ?
Example of the result:
{'description': 'weight(title:5 in 72454) [PerFieldSimilarity], result of:', 'details': [{'description': 'score(doc=72454,freq=1.0 = termFreq=1.0\n), product of:', 'details': [{'description': 'idf(docFreq=3040, maxDocs=901722)', 'value': 5.6922855}, {'description': 'tfNorm, computed from:', 'details': [{'description': 'termFreq=1.0', 'value': 1.0}, {'description': 'parameter k1', 'value': 2.0}, {'description': 'parameter b', 'value': 0.75}, {'description': 'avgFieldLength', 'value': 2.2184565}, {'description': 'fieldLength', 'value': 5.2244897}], 'value': 0.5961232}], 'value': 3.3933036}], 'value': 3.3933036}], 'value': 11.2237425},
The setting I use :
{'settings': {'index': {'creation_date': '1430658049103', 'index': {'store': {'type': 'default'}}, 'number_of_replicas': 0, 'number_of_shards': '8', 'my_bm25_t': {'b': '0.75',
'discount_overlaps': 'false', 'k1': '2.0', 'type': 'BM25'}}}}
{'mappings': {product': {'_ttl': {'enabled': True},
'properties': { 'title': {'analyzer': 'english', 'similarity': 'my_bm25_t', 'type': 'string'} 'brand': {'index': 'not_analyzed', 'type': 'string'},
'condition': {'index': 'not_analyzed', 'type': 'string'},}}}}