We use PHP for processing all data, and PHP is extremely slow with processing large(r) arrays of data so this question is about whether or not elasticsearch can do this internally:
-
we have an index with list of documents where one of the fields is a list of keywords or keyword-phrases associated with this document; example:
classify Doc title: "women->beauty"
classify Doc keywords: ["lipliner", "lip liner", "hair dryer", "mascara", "nail", "eyelash", "curling iron"] -
Every time we attempt to classify a product internally, we run an _analyze query to find all matched keyword association against a product title, ie: title: "Philips curling iron for thick hair"
-
Currently we run a match query against classify Doc keywords to see if anything matches but it results in a lot of noise, because multi-worded keywords like: "Panasonic iron for silk clothing" will "match" classify Doc keywords based on a word "iron" vs what is defined as a phrase in classify Doc keywords "curling iron".
QUESTION:
- Is it possible to have elasticsearch tokenize search string based on the list of defined keywords?
- is it possible to add stemming to #1 to allow "wider" lookups like "Best irons for clothes" "Cheapest curling irons from panasonic"
I figure some sort of analyzer could be used for that, or maybe even ?explain=true might already do something similar?
Thank you.