Question about best practices - should I create a separate index when objects can within a document be in the thousands

I am working with a combination of Neo4J and ES in a recommendation engine, I have an algorithm that generates a score between user and product, in Neo4J. I also want to add 3 similar product objects , similar1, similar2, similar3, which can contain up to a 30k items.

What is the best way of storing the score and the similar products in ES?

1 - Create a an object within the product doc with the user_id as the key and the score as the value like this
{
"productName":'product1",
"similar1": ["product_id_1","product_id_2","product_id_3"],
"similar2": ["product_id_4","product_id_5","product_id_6"]
"user_scores:
{"80cc5fe7-1110-44b1-ae74-51511008f5f2": 86},
{"47dc1f69-c9bd-448a-ae22-1ff288c82fe4": 36}
}

2 - Create a a separate index for each user with scores besides each product and again separate indices for each similar set where the doc _id matches that of the product.

My concern with the first is that when the application goes starts to scale if I reach a couple of million users, the documents could end up reaching the 2gb Lucene size limit.

With the 2nd option and the similar product objects from option 1, how would I build a query that could return similar product data from the multiple indices as child objects within the product doc and score the product doc by the user score index

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.