Inverse Document Frequency Scoring with Shared Indices and Routing

jayers · November 17, 2017, 1:50pm

Given the removal of mapping types in Elasticsearch 6.0, we are looking to use one index per document type. Following along with "The Definitive Guide", these will be shared indices, with routing per client, so smaller clients aren't spread across a large number of shards. We're wondering what are the implications of inverse document frequency scores, if there are dramatically different document counts per client, that happen to reside on the same shard? For example…

Say we have a client named Red and a client named Blue that happen to reside on the same shard in each type of index due to routing. We have one index for letters and one for email. Red has a relatively small number of both letters and email. Blue has a relatively large number of letters, but they don't use the email feature, so a small number of email. Given that inverse document frequency is based on all records within a shard (ignoring routing and filtering), if a user for Red searches for the word "Blue", the amount of letters within the shard containing "Blue" will be substantial. This will result in Red's score for letters to be tainted lower, pushing email to the top of Red's results because of Blue's data.

Am I understanding this correctly, and is this a substantial problem to worry about? Is there a way to mitigate the problem so clients don't dramatically affect each other's results? For larger clients, we intent to host them on dedicated indices, but there are still small clients that have dramatically different record sizes, relative to each other.

Thank you for any assistance you can offer!!

system · December 15, 2017, 1:51pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to completely disable Inverse document frequency? Elasticsearch	5	2020	September 19, 2018
[SOLVED] Customing document routing Elasticsearch	7	795	July 5, 2017
Per Shard Statistics Elasticsearch	4	1146	July 6, 2017
App Search - Scoring algorithm in B2B and multi-entity model Elastic Search elastic-app-search	1	277	September 2, 2022
One index for each type, or force all documents to fit one type in one index? Elasticsearch	2	610	August 23, 2017

Inverse Document Frequency Scoring with Shared Indices and Routing

Related topics