Tokenize string based on available field values


(Alex Smirnov) #1

We use PHP for processing all data, and PHP is extremely slow with processing large(r) arrays of data so this question is about whether or not elasticsearch can do this internally:

  1. we have an index with list of documents where one of the fields is a list of keywords or keyword-phrases associated with this document; example:
    classify Doc title: "women->beauty"
    classify Doc keywords: ["lipliner", "lip liner", "hair dryer", "mascara", "nail", "eyelash", "curling iron"]

  2. Every time we attempt to classify a product internally, we run an _analyze query to find all matched keyword association against a product title, ie: title: "Philips curling iron for thick hair"

  3. Currently we run a match query against classify Doc keywords to see if anything matches but it results in a lot of noise, because multi-worded keywords like: "Panasonic iron for silk clothing" will "match" classify Doc keywords based on a word "iron" vs what is defined as a phrase in classify Doc keywords "curling iron".

QUESTION:

  1. Is it possible to have elasticsearch tokenize search string based on the list of defined keywords?
  2. is it possible to add stemming to #1 to allow "wider" lookups like "Best irons for clothes" "Cheapest curling irons from panasonic"

I figure some sort of analyzer could be used for that, or maybe even ?explain=true might already do something similar?

Thank you.


(system) #2