I am using Elasticsearch to index products from Amazon. Currently the index holds 30Million products. Each product is associated to a category.
Now when I do a search for
Canon 7d, all the accessories for canon 7d appears because those has the exact match for the text
canon 7d. Wherein the actual products title is
Canon EOS 7d. Because of this, elasticsearch ranks the accessory more than the actual product.
In this case, how do I tune elasticsearch so that I get the expected result? Any pointers will be helpful
Well buy my book of course!
What is your goal? What's in your data to use? Relevance isn't a clean-cut black/white, your building a user experience through search. Do you want to create a distinction between a core product and accessories? That's a specific problem you can tackle.
How would you go about tackling that? I might try something like the following strategy.
- Create a list of synonyms for Canon 7D
- Create a field called "core_product_title" that stores this name and synonyms
- Create a field called "is_core_product" that the non accessory product might get
- Search against "core_product_title." Boost heavily on is_core_product
Ok that's just a first pass. The trick is often in curating synonyms and accurately providing the fields that differentiate core products from accessories. Maybe this data is already part of your data set? You can do this manually, but you have 30 million documents. I might consider training a classifier. Here you manually tag some number of documents to differentiate core products and accessories. You also tell a classifier what features in the data might be significant in making this distinction. A classifier learns the features that might classify one item as a product and one as an accessory. You may want to limit the scope of this classification to specific areas like photography.
How else might you tackle these problems
- Boost on sales ranking data using a function_score_query based on the assumption that the camera will sell better than the accessories
- Measure term co-occurences to let you build up candidate synonyms
Great suggestions. Thanks for taking time to answer in detail. I will definitely get a copy of the book.