Score varying for same values


(senthil prabhu) #1

Here i have created a gist example.

I have created a indices with a single field named "message". In that
indices, I have 10 documents with same message string, while searching
on this indices i am getting different score values for resulting
documents.

Gist Example Here......

curl -XPUT 'http://192.168.0.141:9200/dbtest123'

curl -XPUT 'http://192.168.0.141:9200/dbtest123/meta/_mapping' -d
'{"datatable" : { "properties" : { "message" : {"type" : "string",
"store" : "yes"} } } }'

curl -XPUT 'http://192.168.0.141:9200/dbtest123/meta/1' -d
'{"message" :"trying out Elastic Search It takes the snapshot while
working on group of documents, Based on the snapshot, it will do
update process or other task"}'
curl -XPUT 'http://192.168.0.141:9200/dbtest123/meta/2' -d
'{"message" : "trying out Elastic Search It takes the snapshot while
working on group of documents, Based on the snapshot, it will do
update process or other task"}'
curl -XPUT 'http://192.168.0.141:9200/dbtest123/meta/3' -d
'{"message" :"trying out Elastic Search It takes the snapshot while
working on group of documents, Based on the snapshot, it will do
update process or other task" }'
....
....
....
up to 10 documents....

curl -XGET 'http://192.168.0.141:9200/dbtest123/meta/_search' -d
'{"query":{"term" : { "message" : "search" }}}'

{"took":0,"timed_out":false,"_shards":{"total":5,"successful":
5,"failed":0},"hits":{"total":10,"max_score":0.15581955,"hits":
[{"_index":"db123","_type":"meta","_id":"3","_score":0.15581955,
"_source" : { "message" : "trying out Elastic Search It takes the
snapshot while working on group of documents, Based on the snapshot,
it will do update process or other task"}},
{"_index":"db123","_type":"meta","_id":"8","_score":0.15581955,
"_source" : { "message" : "trying out Elastic Search It takes the
snapshot while working on group of documents, Based on the snapshot,
it will do update process or other task"}},
{"_index":"db123","_type":"meta","_id":"10","_score":0.15581955,
"_source" : { "message" : "trying out Elastic Search It takes the
snapshot while working on group of documents, Based on the snapshot,
it will do update process or other task"}},{"_index":"db123","_type":"
meta","_id":"4","_score":0.13005449, "_source" : { "message" : "trying
out Elastic Search It takes the snapshot while working on group of
documents, Based on the snapshot, it will do update process or other
task"}},{"_index":"db123","_type":"meta","_id":"9","_score":
0.13005449, "_source" : { "message" : "trying out Elastic Search It
takes the snapshot while working on group of documents, Based on the
snapshot, it will do update process or other task"}},
{"_index":"db123","_type":"meta","_id":"1","_score":0.13005449,
"_source" : { "message" : "trying out Elastic Search It takes the
snapshot while working on group of documents, Based on the snapshot,
it will do update process or other task"}},
{"_index":"db123","_type":"meta","_id":"6","_score":0.13005449,
"_source" : { "message" : "trying out Elastic Search It takes the
snapshot while working on group of documents, Based on the snapshot,
it will do update process or other task"}},
{"_index":"db123","_type":"meta","_id":"2","_score":0.13005449,
"_source" : { "message" : "trying out Elastic Search It takes the
snapshot while working on group of documents, Based on the snapshot,
it will do update process or other task"}},
{"_index":"db123","_type":"meta","_id":"7","_score":0.13005449,
"_source" : { "message" : "trying out Elastic Search It takes the
snapshot while working on group of documents, Based on the snapshot,
it will do update process or other task"}},
{"_index":"db123","_type":"meta","_id":"5","_score":0.067124054,
"_source" : { "message" : "trying out Elastic Search It takes the
snapshot while working on group of documents, Based on the snapshot,
it will do update process or other task"}}]}}.

Questions:

  1. Why does the score varying for same content?

  2. How does the score value calculated in Elastic Search...?

  3. If i have a Boolean query for multiple field, Can we mention a
    specific field for score calculation for full query....?


(Clinton Gormley) #2

Hiya

I have created a indices with a single field named "message". In that
indices, I have 10 documents with same message string, while searching
on this indices i am getting different score values for resulting
documents.

Your docs are stored in different shards, so each shard has a local idea
of how important each term is - it only knows about docs that live in
its own shard.

This is likely to be approximately right, but in certain cases (like
your example) will be slightly inaccurate.

If you add the parameter search_type=dfs_query_then_fetch this will
retrieve the scores for the relevant terms from all shards first,
calculate the real global value, then score correctly. However, this
obviously doesn't perform quite as well as the default
"query_then_fetch".

Try:

curl -XGET 'http://localhost:9200/dbtest123/meta/_search?pretty=1&search_type=dfs_query_then_fetch' -d '
{"query":{"term" : { "message" : "search" }}}'

clint


(senthil prabhu) #3

thanks for your response....

On Jun 3, 2:00 pm, Clinton Gormley clin...@iannounce.co.uk wrote:

Hiya

I have created a indices with a single field named "message". In that
indices, I have 10 documents with same message string, while searching
on this indices i am getting different score values for resulting
documents.

Your docs are stored in different shards, so each shard has a local idea
of how important each term is - it only knows about docs that live in
its own shard.

This is likely to be approximately right, but in certain cases (like
your example) will be slightly inaccurate.

If you add the parameter search_type=dfs_query_then_fetch this will
retrieve the scores for the relevant terms from all shards first,
calculate the real global value, then score correctly. However, this
obviously doesn't perform quite as well as the default
"query_then_fetch".

Try:

curl -XGET 'http://localhost:9200/dbtest123/meta/_search?pretty=1&search_type=dfs...-d '
{"query":{"term" : { "message" : "search" }}}'

clint


(system) #4