Well buy my book of course!
What is your goal? What's in your data to use? Relevance isn't a clean-cut black/white, your building a user experience through search. Do you want to create a distinction between a core product and accessories? That's a specific problem you can tackle.
How would you go about tackling that? I might try something like the following strategy.
- Create a list of synonyms for Canon 7D
- Create a field called "core_product_title" that stores this name and synonyms
- Create a field called "is_core_product" that the non accessory product might get
- Search against "core_product_title." Boost heavily on is_core_product
Ok that's just a first pass. The trick is often in curating synonyms and accurately providing the fields that differentiate core products from accessories. Maybe this data is already part of your data set? You can do this manually, but you have 30 million documents. I might consider training a classifier. Here you manually tag some number of documents to differentiate core products and accessories. You also tell a classifier what features in the data might be significant in making this distinction. A classifier learns the features that might classify one item as a product and one as an accessory. You may want to limit the scope of this classification to specific areas like photography.
How else might you tackle these problems
- Boost on sales ranking data using a function_score_query based on the assumption that the camera will sell better than the accessories
- Measure term co-occurences to let you build up candidate synonyms