Is there a way to influence phrase suggester candidates by specifying weights for tokens / docs?

varadharajan · July 27, 2018, 7:54am

We have loaded a few million documents into an elasticsearch cluster, where the documents are frequently occurring phrases in our master set of documents. We are trying to build a context sensitive spell corrector on top of these phrases. At the moment, we are using a phrase suggester with the below configuration:

{
"suggest": {
"suggestions": {
"text": "some text here",
"phrase": {
"field": "phrase.shingle",
"gram_size": 3,
"direct_generator": [
{
"field": "phrase.shingle",
"suggest_mode": "missing",
"min_word_length": 1,
"prefix_length": 3
}
]
}
}
}
}

If i understood the idea correctly, this would make use of edit distance between token candidates present in the cluster and the tokens from the incoming query to arrive at a score. For example, lets sat there are two terms present in the dataset "galaxa" and "galaxy". If the misspelled query from the user is "galaxx" which one of these candidates will be scored higher? Would it also consider the context, i.e. multiple tokens (words) as part of the misspelled user query to arrive at a better candidate? Is there a way with which we can skew few documents (phrases) to give more weight to some terms over the other?

system · August 24, 2018, 7:54am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Controlling Score on Phrase Suggester Elasticsearch	3	592	March 2, 2018
"Did you mean" query Elasticsearch	8	4358	June 19, 2017
Better phrase suggestions for spelling mistakes Elasticsearch	1	503	July 6, 2017
Phrase suggester giving suggestion on correct terms containing number values Elasticsearch	1	164	August 25, 2023
Phrase suggestor: How to prefer smaller correction distances? Elasticsearch	1	479	July 25, 2017

Is there a way to influence phrase suggester candidates by specifying weights for tokens / docs?

Related topics