your query runs in parallel on multiple shards and score you seeing is
computed independently on every shard
the default similarity is tf/idf based, which means it is using terms
frequency across all documents (so your score on the shard will depend on
data on this shard)
for score to be this same, every shard will have to have more or less this
same documents
you can change search_type to compute score when combining results from
shards (slower):
also check this out (to learn more about similarity algorithms used by
elasticsearch):
because you are searching for user names, it is possible that you could
wrap your query in constant (or function) score query and settle for less
granular scoring:
Cheers,
Karol Gwaj
On Monday, February 17, 2014 11:00:47 AM UTC, Vallabh Bothre wrote:
Dear Friends,
I am using phonetic analysis in elasticsearch to search best results.
When i search keyword lets say "McDonald" Elasticsearch returns many
listings with "McDonald's" but some of these have differrent scores.
I am manipulating results based on score and due to this difference its
affecting my functionality.
All returned listings with "McDonald" have same case and are exact.
For Ex: Name score
McDonald's 5.8059134
McDonald's 5.8059134
McDonald's 5.8059134
McDonald's 5.7834973
McDonald's 5.7834973
McDonald's 5.4078074
As shown in example above there are 3 different scores which are
highlighted.
As per your suggestion i used search type which execute the query on all
relevant shards and return the results.
"search_type" => "query_then_fetch"
But still i am getting different score for same keyword.
On Monday, February 17, 2014 4:50:51 PM UTC+5:30, Karol Gwaj wrote:
your query runs in parallel on multiple shards and score you seeing is
computed independently on every shard
the default similarity is tf/idf based, which means it is using terms
frequency across all documents (so your score on the shard will depend on
data on this shard)
for score to be this same, every shard will have to have more or less this
same documents
you can change search_type to compute score when combining results from
shards (slower):
because you are searching for user names, it is possible that you could
wrap your query in constant (or function) score query and settle for less
granular scoring:
The different scores you are seeing is probably caused by different IDF
scores on each shard. If you really need the exact IDF values on each shard
you should checkout dfs_query_[and/then]_fetch.
yep, as Isabel mentioned, you should use dfs_query_then_fetch search type
(it is slower doo)
On Monday, February 17, 2014 3:23:27 PM UTC, Vallabh Bothre wrote:
Thanks Karol for replying,
As per your suggestion i used search type which execute the query on all
relevant shards and return the results.
"search_type" => "query_then_fetch"
But still i am getting different score for same keyword.
On Monday, February 17, 2014 4:50:51 PM UTC+5:30, Karol Gwaj wrote:
your query runs in parallel on multiple shards and score you seeing is
computed independently on every shard
the default similarity is tf/idf based, which means it is using terms
frequency across all documents (so your score on the shard will depend on
data on this shard)
for score to be this same, every shard will have to have more or less
this same documents
you can change search_type to compute score when combining results from
shards (slower):
because you are searching for user names, it is possible that you could
wrap your query in constant (or function) score query and settle for less
granular scoring:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.