I'm new to elasticsearch and Im developing product recommendation like 'People who bought this, also bought...'
I have an index named "products" which I use for faceted filtering (prices, category, brand and so on...), each document in that index has all the informations needed to show a full page of products. So with just one query in elasticsearch, I can build the page.
For the recommendation system, I thought about using the significant terms in my completed orders, so I can suggest products bought together to a X product that was added in cart.
So the index would be something like this:
With that I have the identifiers (sku's) from products, but I dont have all the information, is it right doing two queries to build my recommendation shelf?
And what if I wanted to recommend products filtering by price? (eg. products bought together but with price lower than $80) I would need to query in products by sku and price?
If I understand your example correctly you're using previous orders to make the associations between products. So the index you will be querying is the orders index and looking for significant terms in the products field.
You say that you also want to then filter the product suggestions by price - you are right that this would need to be handled in your app using a subsequent query on an different index like products where given the sku you can get the price (and colour/size etc).
You should be able to set the size parameter on the significant_terms agg to something large e.g. 10000 skus to give you a large base to then filter. Looking up many IDs in a search may prove to be expensive.
You could try filter the significant_terms agg with a query for orders with products not exceeding the required price range but that might be over-aggressive as whole orders may be filtered if they contain expensive products that are nothing to do with your search term or suggested "also-boughts.
Yes, it contains prices, but the problem is that the products price changes frequently, so it wouldn't be easy to keep it synchronized. And maybe I would like to filter by other properties (category, brand, ...)
Ok, lets say I dont want to filter the products returned by the significant terms. I would need the same way to do a subsequent query to get the products information within products index, querying just by the skus. This is a right approach?
You'd want to query the products index - use a bool query with 2 must or filter clauses - one is a terms query listing the skus and the other is a range query with the price restriction
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.