Issue with ID and scores


(Akhilesh Anb) #1

We are creating an index with documents that contain 1 string each.The process then runs a match query with a certain fuzziness factor on the index.
Now, depending on the ID value for each document, which seems irrelevant, the scoring ends up different.
For example, if we have 100 documents with ID's 1 - 100, we get different scores than if the ID's are 2 - 101.
So when new documents are added to the index, the scoring changes. We would like to know if this is expected behavior for ES -1.6.2.


(Nik Everett) #2

Have a look at search_type, particularly, dfs_query_then_fetch. _id controls which shard a document is on and by default shard local information is used to compute the score. This isn't usually a big problem if you have lots of documents but it comes up if you have very few. It matters also if you search for very unique things but it usually a big deal unless you have very few documents.

It can also come up if you use routing (which overrides using the _id_ to pick the shard) and create shards that have very different sizes.

That is a fairly old version of Elasticsearch. It is getting "historical" for those of us that work on Elasticsearch every day.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.