Question about best practices - should I create a separate index when objects can within a document be in the thousands

Dave_Clissold · May 28, 2018, 11:32am

I am working with a combination of Neo4J and ES in a recommendation engine, I have an algorithm that generates a score between user and product, in Neo4J. I also want to add 3 similar product objects , similar1, similar2, similar3, which can contain up to a 30k items.

What is the best way of storing the score and the similar products in ES?

1 - Create a an object within the product doc with the user_id as the key and the score as the value like this
{
"productName":'product1",
"similar1": ["product_id_1","product_id_2","product_id_3"],
"similar2": ["product_id_4","product_id_5","product_id_6"]
"user_scores:
{"80cc5fe7-1110-44b1-ae74-51511008f5f2": 86},
{"47dc1f69-c9bd-448a-ae22-1ff288c82fe4": 36}
}

2 - Create a a separate index for each user with scores besides each product and again separate indices for each similar set where the doc _id matches that of the product.

My concern with the first is that when the application goes starts to scale if I reach a couple of million users, the documents could end up reaching the 2gb Lucene size limit.

With the 2nd option and the similar product objects from option 1, how would I build a query that could return similar product data from the multiple indices as child objects within the product doc and score the product doc by the user score index

system · June 25, 2018, 11:32am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Is it worse to have very large documents/indices with a ton of data, or split that data into smaller documents/indices using a relational paradigm? Elasticsearch	2	431	January 31, 2022
Store and query by user metadata (last viewed, etc.) Elasticsearch	5	1467	August 24, 2018
My use case : Joining indices? Elasticsearch	5	345	July 28, 2022
Indexing a large Nx N matrix of similarity with ES Elasticsearch	4	658	July 6, 2017
Advice on Index and Cluster Structure? Elasticsearch	4	1023	July 5, 2017

Question about best practices - should I create a separate index when objects can within a document be in the thousands

Related topics