Different score for exact same keyword

Dear Friends,

I am using phonetic analysis in elasticsearch to search best results.

When i search keyword lets say "McDonald" elastic search returns many
listings with "McDonald's" but some of these have differrent scores.

I am manipulating results based on score and due to this difference its
affecting my functionality.

All returned listings with "McDonald" have same case and are exact.

For Ex:
Name score
McDonald's 5.8059134
McDonald's 5.8059134
McDonald's 5.8059134
McDonald's 5.7834973
McDonald's 5.7834973
McDonald's 5.4078074

As shown in example above there are 3 different scores which are
highlighted.

Any help is very much appreciated.

Regards,
Vallabh

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/10e3e24b-06cc-4e46-acf8-b6c46a549cc2%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

your query runs in parallel on multiple shards and score you seeing is
computed independently on every shard
the default similarity is tf/idf based, which means it is using terms
frequency across all documents (so your score on the shard will depend on
data on this shard)
for score to be this same, every shard will have to have more or less this
same documents

you can change search_type to compute score when combining results from
shards (slower):

also check this out (to learn more about similarity algorithms used by
elasticsearch):

because you are searching for user names, it is possible that you could
wrap your query in constant (or function) score query and settle for less
granular scoring:

Cheers,
Karol Gwaj

On Monday, February 17, 2014 11:00:47 AM UTC, Vallabh Bothre wrote:

Dear Friends,

I am using phonetic analysis in elasticsearch to search best results.

When i search keyword lets say "McDonald" Elasticsearch returns many
listings with "McDonald's" but some of these have differrent scores.

I am manipulating results based on score and due to this difference its
affecting my functionality.

All returned listings with "McDonald" have same case and are exact.

For Ex:
Name score
McDonald's 5.8059134
McDonald's 5.8059134
McDonald's 5.8059134
McDonald's 5.7834973
McDonald's 5.7834973
McDonald's 5.4078074

As shown in example above there are 3 different scores which are
highlighted.

Any help is very much appreciated.

Regards,
Vallabh

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/58f4e7c8-c785-4e5b-96d3-33148c780055%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks Karol for replying,

As per your suggestion i used search type which execute the query on all
relevant shards and return the results.
"search_type" => "query_then_fetch"

But still i am getting different score for same keyword.

On Monday, February 17, 2014 4:50:51 PM UTC+5:30, Karol Gwaj wrote:

your query runs in parallel on multiple shards and score you seeing is
computed independently on every shard
the default similarity is tf/idf based, which means it is using terms
frequency across all documents (so your score on the shard will depend on
data on this shard)
for score to be this same, every shard will have to have more or less this
same documents

you can change search_type to compute score when combining results from
shards (slower):

Elasticsearch Platform — Find real-time answers at scale | Elastic

also check this out (to learn more about similarity algorithms used by
elasticsearch):

Elasticsearch Platform — Find real-time answers at scale | Elastic

because you are searching for user names, it is possible that you could
wrap your query in constant (or function) score query and settle for less
granular scoring:

Elasticsearch Platform — Find real-time answers at scale | Elastic

Cheers,
Karol Gwaj

On Monday, February 17, 2014 11:00:47 AM UTC, Vallabh Bothre wrote:

Dear Friends,

I am using phonetic analysis in elasticsearch to search best results.

When i search keyword lets say "McDonald" Elasticsearch returns many
listings with "McDonald's" but some of these have differrent scores.

I am manipulating results based on score and due to this difference its
affecting my functionality.

All returned listings with "McDonald" have same case and are exact.

For Ex:
Name score
McDonald's 5.8059134
McDonald's 5.8059134
McDonald's 5.8059134
McDonald's 5.7834973
McDonald's 5.7834973
McDonald's 5.4078074

As shown in example above there are 3 different scores which are
highlighted.

Any help is very much appreciated.

Regards,
Vallabh

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/43e289b6-c8c2-48db-ace2-18c4b330d2eb%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Mon, Feb 17, 2014 at 4:23 PM, Vallabh Bothre vallabh.bothre@gmail.comwrote:

As per your suggestion i used search type which execute the query on all
relevant shards and return the results.
"search_type" => "query_then_fetch"

As per

Elasticsearch Platform — Find real-time answers at scale | Elastic

reference/current/search-request-search-type.html

this is the default search type used even when not explicitly specifying a
search type.

As per

modules-similarity.html

Elasticsearch uses TF/IDF as default similarity. For more information on
how TF/IDF is computed see also the Lucene documentation:

http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

The different scores you are seeing is probably caused by different IDF
scores on each shard. If you really need the exact IDF values on each shard
you should checkout dfs_query_[and/then]_fetch.

Isabel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAFSgB-Bt6VN4oxEHy86YrS7NsvxDwEj9ids58PT-FEOd4WJZgw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

yep, as Isabel mentioned, you should use dfs_query_then_fetch search type
(it is slower doo)

On Monday, February 17, 2014 3:23:27 PM UTC, Vallabh Bothre wrote:

Thanks Karol for replying,

As per your suggestion i used search type which execute the query on all
relevant shards and return the results.
"search_type" => "query_then_fetch"

But still i am getting different score for same keyword.

On Monday, February 17, 2014 4:50:51 PM UTC+5:30, Karol Gwaj wrote:

your query runs in parallel on multiple shards and score you seeing is
computed independently on every shard
the default similarity is tf/idf based, which means it is using terms
frequency across all documents (so your score on the shard will depend on
data on this shard)
for score to be this same, every shard will have to have more or less
this same documents

you can change search_type to compute score when combining results from
shards (slower):

Elasticsearch Platform — Find real-time answers at scale | Elastic

also check this out (to learn more about similarity algorithms used by
elasticsearch):

Elasticsearch Platform — Find real-time answers at scale | Elastic

because you are searching for user names, it is possible that you could
wrap your query in constant (or function) score query and settle for less
granular scoring:

Elasticsearch Platform — Find real-time answers at scale | Elastic

Cheers,
Karol Gwaj

On Monday, February 17, 2014 11:00:47 AM UTC, Vallabh Bothre wrote:

Dear Friends,

I am using phonetic analysis in elasticsearch to search best results.

When i search keyword lets say "McDonald" Elasticsearch returns many
listings with "McDonald's" but some of these have differrent scores.

I am manipulating results based on score and due to this difference its
affecting my functionality.

All returned listings with "McDonald" have same case and are exact.

For Ex:
Name score
McDonald's 5.8059134
McDonald's 5.8059134
McDonald's 5.8059134
McDonald's 5.7834973
McDonald's 5.7834973
McDonald's 5.4078074

As shown in example above there are 3 different scores which are
highlighted.

Any help is very much appreciated.

Regards,
Vallabh

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9680df2f-1301-4637-88d0-7c56c15a25e1%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.