App Search - Scoring algorithm in B2B and multi-entity model

Gerardo_Zenobi · August 5, 2022, 1:32pm

Hi

Context

We have a B2B product. We would like to index different types of entities (e.g. User, Training, Group, etc). It was recommended that the best approach here would be to use one engine per entity.

Problem/Question

After reading this article that described how documents are scored, I had the following questions:

By keeping entities separate in different engines: are we somewhat disadvantaging ourselves because we won't be deriving our inverse document frequencies from the whole corpus (all the entities combined) — rather just, say, from each entity's corpus ?

At the same time, given our product is B2B and that our clients come from all sort of industries:

is it correct to assume that we wouldn't want the rarity/inverse document frequency of each term to be based on its rarity across all of our clients' data, but only within the documents of a given client (1 client= 1 corpus) ?
if indeed the above was a problem, would the only solution be having engines by client ? (though I would be scared that having thousands of clients would complexify the solution in this case).
Perhaps there’s a way to tell ES to calculate IDFs based on a subset of the docs in an index?

Thanks in advance.

system · September 2, 2022, 1:32pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Term Frequency of Different Entities Elasticsearch	1	283	July 6, 2017
Inverse Document Frequency Scoring with Shared Indices and Routing Elasticsearch	1	491	December 15, 2017
Multiple Entity Type Indexing and Search Elastic Search	5	1216	July 21, 2022
IDF calculation based on Filter? Elasticsearch	2	448	June 12, 2018
Compare relevance for different document types Elasticsearch	1	434	July 5, 2017

App Search - Scoring algorithm in B2B and multi-entity model

Context

Problem/Question

Related topics