Question about best practices - should I create a separate index when objects can within a document be in the thousands

(Dave Clissold) #1

I am working with a combination of Neo4J and ES in a recommendation engine, I have an algorithm that generates a score between user and product, in Neo4J. I also want to add 3 similar product objects , similar1, similar2, similar3, which can contain up to a 30k items.

What is the best way of storing the score and the similar products in ES?

1 - Create a an object within the product doc with the user_id as the key and the score as the value like this
"similar1": ["product_id_1","product_id_2","product_id_3"],
"similar2": ["product_id_4","product_id_5","product_id_6"]
{"80cc5fe7-1110-44b1-ae74-51511008f5f2": 86},
{"47dc1f69-c9bd-448a-ae22-1ff288c82fe4": 36}

2 - Create a a separate index for each user with scores besides each product and again separate indices for each similar set where the doc _id matches that of the product.

My concern with the first is that when the application goes starts to scale if I reach a couple of million users, the documents could end up reaching the 2gb Lucene size limit.

With the 2nd option and the similar product objects from option 1, how would I build a query that could return similar product data from the multiple indices as child objects within the product doc and score the product doc by the user score index

(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.