I have created an index in Elasticsearch. Using Kibana, I performed a search of a field for a particular word. Kibana returned me the expected results, but the _score for each record isn't what I would have expected e.g. One record had the word only once in it's text and another had it twice. The one that had it only once gave a score of 6.33 whilst the one with 2 gave a score of 5.726. Other returned records where the word was only in the text once returned scores of 5.968, 5.379 etc. I thought it might have been taking into account letter casing, but changing the case on the search made no difference. Can someone explain to the how the _score is obtained? I'd have thought that all those records with only one occurrence for instance would all have had the same _score.
If you add "explain": true in you query you will get those details. Bit complicated though.
In short: what is taken into account:
frequency of term you are searching for within your document field: the more, the better. But be aware that text is analyzed before being indexed
frequency of term you are searching for within the full index: the lesser, the better.
size of the term
If you want to leverage the casing, you can index the same text using 2 different analyzers: standard analyzer and one custom which just uses a standard tokenizer. Then search using a bool query and 2 should clauses: one on the lowercased field (standard analyzer) and one on the preserved case field (custom analyzer).
It will give you on top of the list the ones which matches with exact case.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.