I am working with a combination of Neo4J and ES in a recommendation engine, I have an algorithm that generates a score between user and product, in Neo4J. I also want to add 3 similar product objects , similar1, similar2, similar3, which can contain up to a 30k items.
What is the best way of storing the score and the similar products in ES?
1 - Create a an object within the product doc with the user_id as the key and the score as the value like this
{
"productName":'product1",
"similar1": ["product_id_1","product_id_2","product_id_3"],
"similar2": ["product_id_4","product_id_5","product_id_6"]
"user_scores:
{"80cc5fe7-1110-44b1-ae74-51511008f5f2": 86},
{"47dc1f69-c9bd-448a-ae22-1ff288c82fe4": 36}
}
2 - Create a a separate index for each user with scores besides each product and again separate indices for each similar set where the doc _id matches that of the product.
My concern with the first is that when the application goes starts to scale if I reach a couple of million users, the documents could end up reaching the 2gb Lucene size limit.
With the 2nd option and the similar product objects from option 1, how would I build a query that could return similar product data from the multiple indices as child objects within the product doc and score the product doc by the user score index