We have loaded a few million documents into an elasticsearch cluster, where the documents are frequently occurring phrases in our master set of documents. We are trying to build a context sensitive spell corrector on top of these phrases. At the moment, we are using a phrase suggester with the below configuration:
"text": "some text here",
If i understood the idea correctly, this would make use of edit distance between token candidates present in the cluster and the tokens from the incoming query to arrive at a score. For example, lets sat there are two terms present in the dataset "galaxa" and "galaxy". If the misspelled query from the user is "galaxx" which one of these candidates will be scored higher? Would it also consider the context, i.e. multiple tokens (words) as part of the misspelled user query to arrive at a better candidate? Is there a way with which we can skew few documents (phrases) to give more weight to some terms over the other?