I've been looking into using ngrams to solve searching using wrong spelling and handling complex compound words both in the index and query string.
I came across ngrams-compound-words which really fits my needs, but the example query does not work in ES 2.*. The "minimum_should_match" does not seem to work as in the example any more.
Is it possible to achieve the same functionality in ES 2.*?
The example query looks like this:
GET /my_index/my_type/_search
{
"query": {
"match": {
"text": {
"query": "Gesundheit",
"minimum_should_match": "80%" //this does not work in ES 2.*
}
}
}
}
Changing the "minimum_should_match" has no effect on the search result in 2.*. In my understanding, a higher percentage would only return hits where more of the ngrams from the query matches a ngram in the document. The result should be a higher precision.
So far I've been able to replicate the functionality by splitting the query into ngram and constructing a boolean query, but that complicates this feature a lot.
Hmm the minimum_should_match set on the query string only applies if the query parsed to a boolean query, and coord was disabled. Here's the comment on top of this logic, in ES master QueryStringBuilder.java:
// If the coordination factor is disabled on a boolean query we don't apply the minimum should match.
// This is done to make sure that the minimum_should_match doesn't get applied when there is only one word
// and multiple variations of the same word in the query (synonyms for instance).
if (query instanceof BooleanQuery && !((BooleanQuery) query).isCoordDisabled()) {
query = Queries.applyMinimumShouldMatch((BooleanQuery) query, this.minimumShouldMatch());
}
We need to see exactly what query class ES created on parsing your query string with your ngram tokenizer...
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.