Elasticsearch for product database search - weighting

Absolute newb here, so go easy! :smirk:

I am wanting to build a better search for my shopping site, and I'm tinkering with elasticsearch right now. I'm trying to come up with a somewhat smarter "default" product search query that searches four fields, each weighted differently:

name (product name), weighted highest
search_words (list of keywords), medium weight
description (item copy), low weight
skuid (item number), lowest weight - needed in case the customer plugs in a SKU to the search field

But I also want to consider two other fields: sort_priority and creation_date

sort_priority is a value we use for "important" items that we want to show higher when browsing category. Since it denotes an item's importance, I want to factor it in to search somehow. 9999999 is the default, and anything lower is considered more important, with "1" being the most important.

creation_date is just that. It would be awesome to be able to weight newer items a bit higher.

My index has the following mappings:

mappings => { "product" => { properties => { "price" => {type => 'double'}, "origprice" => {type => 'double'}, "sort_priority" => {type => 'integer'}, "creation_date" => { type => "date", format => "YYYY-mm-dd" } } } }
Which I hope will give me the ability to use sort_priority and creation_date in the fashion I outlined.

This query seems to work for the matching:

"query" : { "multi_match" : { "query": "awesome product", "type": "best_fields", "fields": [ "name^10", "search_words^5", "description^2", "skuid^1" ] } }

But I can't figure out the scoring manipulation. I'd think something in this realm?

"query" : { "function_score": { "query" : { "multi_match" : { "query": "awesome product", "type": "best_fields", "fields": [ "name^10", "search_words^5", "description^2", "skuid^1" ] } } }, "functions" : [ "script_score" : { "script" : "_score * (1000000 - doc['sort_priority'].value)" } ] }

(though that doesn't factor in creation_date at all)

Any help or hints appreciated!

Yeah, something like it. I think you have all the right tools but are maybe missing a few points.

  1. Have a look at decay functions. You probably don't want to use dynamic scripting because it is a security hole, and, unless you are on 5.0's alphas, you only have groovy to work with, which isn't properly sandboxed. In 5.0 you'll have painless which is sandboxed, but it still might be simpler to use the decay functions.

  2. You might want cross_fields instead of best_fields.

  3. When you post a code snippet, wrap it in ``` above and below and the site will preserve the formatting.

  4. You should be able to apply creation_date in the same way as sort priority. Personally I like gaussian decay for creation date kinds of things.

Thanks so much for the guidance, Nik. I'll check it out.