Tokenize string based on available field values

virtuman · May 20, 2015, 10:58pm

We use PHP for processing all data, and PHP is extremely slow with processing large(r) arrays of data so this question is about whether or not elasticsearch can do this internally:

we have an index with list of documents where one of the fields is a list of keywords or keyword-phrases associated with this document; example:
classify Doc title: "women->beauty"
classify Doc keywords: ["lipliner", "lip liner", "hair dryer", "mascara", "nail", "eyelash", "curling iron"]
Every time we attempt to classify a product internally, we run an _analyze query to find all matched keyword association against a product title, ie: title: "Philips curling iron for thick hair"
Currently we run a match query against classify Doc keywords to see if anything matches but it results in a lot of noise, because multi-worded keywords like: "Panasonic iron for silk clothing" will "match" classify Doc keywords based on a word "iron" vs what is defined as a phrase in classify Doc keywords "curling iron".

QUESTION:

Is it possible to have elasticsearch tokenize search string based on the list of defined keywords?
is it possible to add stemming to #1 to allow "wider" lookups like "Best irons for clothes" "Cheapest curling irons from panasonic"

I figure some sort of analyzer could be used for that, or maybe even ?explain=true might already do something similar?

Thank you.

Topic		Replies	Views
Index text as keyword array leveraging tokenizers and filters Elasticsearch	1	515	July 19, 2020
Searching for text combinations Elasticsearch	2	482	January 23, 2018
Plugin development guidance Elasticsearch	3	391	July 6, 2017
String to Keywords Elasticsearch	1	417	March 31, 2018
Keyword extraction Elasticsearch	6	6331	July 6, 2017

Tokenize string based on available field values

Related topics