I am working on a project where a user is provided with Suggest as you type functionality. The suggestions are provided using a separate suggest index which is build from query history and predefined product categories. After the user selects the suggestion, the user will be redirected to a search page where corresponding filters (product categories) are selected and are queried on another index. For the categories this will happen based on an exact term query, but some parts might happen using text search. I am looking for a way to find tokens in the user query that could not be matched against a suggestion. Such that the users can search for a certain keywords within a category.
As an example (this is not my real use-case) let's propose i want a user to search for products, such products have properties that we could auto suggest like product category and manufacturer. However for titles and description in this case I want to use a more full-text base approach and due to a lack of available user search queries i can't just extract possible terms from there.
Consider the following auto-suggest index (all queries are executes on ES 6.1):
PUT /search-suggest
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
}
}
With the following mapping, containing the category and manufacturer as properties of the auto complete query. Which can be used set filters on the users search.
PUT /search-suggest/_mapping/search-suggest
{
"search-suggest": {
"properties": {
"suggestion": {
"type": "text",
"analyzer": "autocomplete"
},
"category": {
"type": "keyword"
},
"manufacturer": {
"type": "keyword"
}
}
}
}
And consider the following example suggestion documents:
POST /search-suggest/search-suggest/_bulk
{ "index": { "_id": 1 }}
{ "name": "Gaming", "category":"gaming" }
{ "index": { "_id": 2 }}
{ "name": "Sony in gaming", "category":"gaming", "manufacturer": "sony" }
We can query it like this:
GET /search-suggest/search-suggest/_search
{
"query": {
"match": {
"name": {
"query": "Sny playstation",
"fuzziness": "auto"
}
}
}
}
Which would return the following document:
{
"name": "Sony in gaming",
"category": "gaming",
"manufacturer": "sony"
}
Now, I am looking for a way that sny has been matched in my query and it's catched by the filter manufacturer:sony. But i want to somehow find out that "playstation" has not been matched and i can use that as free text search (this make more sense in my actual use case). So you would get something like search for "playstation" in products with manufacturer sony.
I figured that i could try to use highlighting to find the matched keywords, then use the difference between the query and matched keywords to :
GET /search-suggest/search-suggest/_search
{
"query": {
"match": {
"name": {
"query": "Sny playstation",
"fuzziness": "auto"
}
}
},
"highlight": {
"fields":{
"name": {
"type": "plain"
}
}
}
}
Resulting in the following highlight, however i cannot easily match the fuzzy token..
"highlight": {
"name": [
"<em>Sony</em> in gaming"
]
}
Does anybody have an idea about how I could solve this or any other methods that might work? For my use case I also have been thinking about using significant text to extract certain important keywords for each category.
Thanks in advance.